.  j 


Cop 

*  *  A  Final  Report 
Grant  No.  AFOSR-85-97-0363^ 
September  1, 1985  -  October  31, 1987 


y  8  *  '/  0  i  ^ 


A  SENSOR  WITH  BIOLOGICAL  PREPROCESSING  FEATURES 


Rafael  M.  Inigo 
Professor 


Submitted  to: 

Air  Force  Office  of  Scientific  Research 
Building  412 
Bolling  Air  Force  Base 
Washington,  D.C.  20332-6448 

Submitted  by: 


Eugene  S.  McV'ev 
Professor 


Jonathan  F.  Doner 
Research  Assistant  Professor 
Biomedical  Engineering 

Chiewcharn  Narathong 
Graduate  Student 


Jay  I.  Minnix 
Graduate  Student 


John  Sigda 
Graduate  Student 

Report  No.  UVA/525679/EE88/101 
December  1987 


BBS 


2  4 1988 


I 


S. 


SCHOOL  OF  ENGINEERING  AND 
APPLIED  SCIENCE 


88  2  £4  120 


UNIVERSITY  OF  VIRGINIA 
CHARLOTTESVILLE.  VIRGINIA  22901 


REPORT  DOCUMENTATION  PAGE 


j  'a  ^irQRT  ScCijRlTY  CLASSiriCATlONi 

I  Unclassified 


la.  icCwf^lTY  CLASSlFfCAnON  AUTHORITY 


j  *:d  J£C^SSiPiCATlON/OOWNGRAOtN<3  SCHcOULt 

4 


ORMING  ORGANIZATION  REPORT  NUMBtR(S) 

i/525679/EE87  ■ 


6c.  ADOSE5S  iCty,  State,  ana  ZlPCoae) 

Thornton  Hall 

C^Tar]ottes vi  1  le -  VA  22901 


So  OFPICH  SYMBOL 
(If  appiicao/e) 


AME  OF  i^UNDING /SPONSORING 

RGANizAr;oN  /\-j ^  Force  Office 
Scientific  Research/NE 


Sc.  AOOR£SS(Gty,  State,  and  iIPCoae) 

Building  410 

Bolling  Air  Force  Base 

Washington,  DC  20332-6448 


n  Title  ((nc/uo*  Security  Classitication) 

A  Sensor  with  Biological  Preprocessing  Features 


W  ^cssoNAL  AurHOR(S) 

R.  M.  Inigo,  E.  S.  McVey,  J.  F.  Doner 


13a.  -YPE  OF  REPORr 

Final 


10.  restrictive  .MARiONGS 

None 


OlSTRIBUTlQiV/ A.AlLAilLiTY  OF  REFCRT 

Approved  for  public  release,  distribution 
unlimited. 


5.  MONITORING  ORGANIZATION  REPORT  NUMBERlS) 

AFOSR.tr.  8  8  -  0  048 


7a.  NAME  OP  MONITORING  ORGANIZATION 

Aros(^ 


7b.  ADDRESS  iO  ty,  State,  and  ZIP  Code) 

Building  410,  Bolling  Air  Force  Base 
Washington,  DC  20332-6448 


Sb.  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

AFCSR-as-Z-osea.^ 


10  SOURCE  OF  FUNDING  NUMBERS 


PROGRAM 
ELEMENT  NO 


Uimr 


|t3b  TIME  COVERED 

1 

14  DATS  OF  REPORT  (Year,  Month,  Day} 

18  December  1987 

COSATl  COOES 


GROUP  !  SUB-GROUP 


8.  SUBJECT  TERMS  (Continue  on  reverse  if  necessary  and  identify  by  blo^k  numoerj 


'9  abstract  (Continue  on  reverse  <r  necessary  anof  identify  by  bio<k  numoer) 

' — This  report  discusses  research  on  "A  Sensor  with  Biological  Preprocessing  Features" 
during  the  period  February  1,  1986  to  October  31,  1987. 

The  first  part  of  the  report  reviews  the  basic  geometric  sensor  configuration  and  its 
mapping  to  computation  plane.  Three  basic  configurations  are  discussed  and  their  application 
to  edge  and  short  range  motion  detection  follows. 

The  next  section  covers  a  qualitative  long  range  motion  detection  algorithm.  The 
algorithm  was  tested  first  with  synthetic  images  and  next  with  real  images,  and  in  both  cases 
the  results  were  very  satisfactory.  The  algorithm  is  now  being  tested  with mul tiple  images.  A 
one  line_corre]ation  tracker  is  discussed  in  the  next  section.  The  correlation  tracker  can 
be  implemented  with  parallel  architecture  and  is,  thus,  very  fast.  Tests  have  been  performed 
jwith  satisfactory  results  using  real  images.  Pattern  recognition  using  the  3VS  together  with 
ia  class  of  transforms  which  are  invariant  to  translation  is  discussed.  It  should  be  possible^ 


.0  O.T'R  BuriON/ availability  of  abstract 

□  J'iC.ASSiFiEOnjNLIMITED  □  SAME  AS  RPT  □  OTIC  USERS 


'.A'/E  OF  RESPONSIBLE  ‘/OlVlOUAL 

Dr.  C.  Lee  Giles 


21.  ABSTRACT  SECURITY  CLASSiFICATlON 

□  otic  USERS  Unclassified  _ 


OD  FORM  1473,  84  mar  ARR  ea/tion  may  oe  ui«o  until  e 

All  oth*r  editionj  are  oosolete 


SECURITY  ri ASSIFiCATIQN  of  This  RAGE 


A  Final  Report 

Grant  No.  AFOSR-85-03-0363-B 
September  1,  1985  -  October  31,  1987 

A  SENSOR  WITH  BIOLOGICAL  PREPROCESSING  FEATURES 

Submitted  to: 

Air  Force  Office  of  Scientific  Research 
Building  412 
Boll ing  Ai r  Force  Base 
Washington,  D.C.  20332-6448 

Attention:  Dr.  C.  Lee  Giles 
Submitted  by: 

Rafael  M.  Inigo 
Professor 

Jonathon  F.  Doner 
Research  Assistant  Professor 
Biomedical  Engineering 

Chiewcharn  Narathong 
Graduate  Student 

Zia-ur  Rahman 
Graduate  Student 

John  Sigda 
Graduate  Student 


Eugene  S.  McVey 
Professor 


Jay  Minnix 
Graduate  Student 

William  I.  Clement 
Graduate  Student 


Department  of  Electrical  Engineering 
SCHOOL  OF  ENGINEERING  AND  APPLIED  SCIENCE 
UNIVERSITY  OF  VIRGINIA 
CHARLOTTESVILLE,  VIRGINIA 


Accession  For 


NTIS  ORAil 
DTIC  TAB 

Unannounced 

Justification. 


I 


□ 


By- 


Plstrlbutlon/ 


Availability  Codes 


Dlst 


jAvaiT  and/or 
Special 


Reoort  No.  UVA/525679/EE88/1 01 


Copy  No. 


J". 

•  * 


SA'LVl.W 


Copy 

JwtpecTt 


4 


1^ 


f 


!• 


TABLE  OF  CONTENTS 


1.  INTRODUCTION .  1 

2.  REVIEW  OF  PREVIOUS  WORK .  3 

2.1.  Biological  Visual  Sensor  Configuration  .  3 

2.2.  Edge  and  Motion  Detection  With  the  New  Sensor .  3 

3.  LONG  RANGE  QUALITATIVE  MOTION  DETECTION  ALGO¬ 

RITHM  .  9 

3.1.  Algorithm  Testing .  10 

3.2.  Problems  with  Real  Images  .  1 1 

3.3.  Schemes  for  Preprocessing  Images .  11 

3.4.  Examples  and  Further  Discussion .  12 

4.  A  FAST  ALGORITHM  FOR  MOTION  PREDICTION .  21 

4.1.  Introduction .  21 

4.2.  The  Tracking  Algorithm .  21 

4.2.1.  2D  Motion  (a  general  case) .  27 

4.2.2.  Simulation  with  a  Gaussian  Grey  Level  Image .  27 

4.3.  Translation,  Rotation,  and  Scaling .  31 

4.3.1.  Simulations  with  a  Real  Image .  34 

4.4.  Speed  Comparison  .  36 

4.4.1.  Conventional  Correlation .  36 

4.4.2.  One-Dimensional  Correlation .  42 

4.5.  Conclusion  .  43 


5.  pattern  recognition  using  the  bvs,  the  C-TRANSFORM 


I 


I 


fc-;. 


Sv 

K:X 


Ln  ' 


.V 

,v 


KV. 


AND  A  NEURAL  NETWORK  CLASSIHER . 

5.1.  Introduction . 

5.2.  Background . 

5.3.  R  and  M  Transform  Advantages  and  Disadvantages . 

5.4.  Computer  Simulation  Results . 

5.5.  Simulation  of  Two-Dimensional  Case . 

5.6.  Use  of  Transforms  With  BVS  Sensor . 

5.7.  Network  Structure  and  Weight  Matrix  . 

5.8.  Reduced  Transform  Representation  . 

5.9.  Extension  into  Two  Dimensions . . . 

5.10.  Implementation  of  Learning  in  an  ANS . 

6.  DUAL  SENSOR  IMPLEMENTATION  AND  ANALYSIS . 

6.1.  Introduction . 

6.2.  Sensor  Technology  . 

6.3.  Method  for  Obtaining  Dual  Representations  of  Images . 

6.4.  A  Method  to  Determine  which  Solution  is  Best . 

6.5.  Analysis  of  Enor  for  Possible  Solutions . 

6.6  A  Possible  Hardware  Implementation  of  Single  Camera/Dual  Sensor 

6.6.1  Operation  of  the  Analog  Summing  Arrangement . 

6.6.2  Operation  of  the  Digital  Summing  Arrangement  . 

6.7  Error  Analysis . 

il 


i' 


7.  DEPTH  COMPUTATION  FROM  OPTIC  FLOW 


7.1.  Introduction 


7.2.  Overview  of  Current  Research 


7.3.  Computation  of  Optic  Flow 


7.4.  Physiological  Correlates  of  Optic  Flow  Computation 

7.5.  An  Approach  to  3D  Computation  from  EM . 


8.  VESTIBULAR-OCULAR  MOTION  FOR  TARGET  TRACKING 


APPENDIX  6. A  :  ERROR  GRAPHS 


LIST  OF  nCURES 


Fig.  2.1(a)  Image  and  Computation  Planes.  Arc-of-ring . 

Fig.  2.1(b)  Image  and  Computation  Planes.  Circular  elements . 

Fig.  2.1(c)  Image  and  Computation  Planes.  Hexagonal  elements . 

Fig.  2.2  Edge  Detection  Using  BVS  . 

Fig.  2.3(a)  Original  image:  a  noisy  image  of  a  lady’s  face  . 

Fig.  2.3(b)  Simulated  circular-element  sensor  image,  image  plane  . 

Fig.  2.3(c)  Edge  detection  by  Laplacian-Gaussian  method,  with  thresholding 

comp,  plane . 

Fig.  2.3(d)  Edge  detection  by  Laplacian  method,  with  thresholding,  comp 

plane  . 

Fig.  2.4  Motion  Detection  by  the  BVS  . 

Fig.  2.5  Short-range  Motion  Detection . 

Fig.  2.6  Motion  Detection  by  the  New  Sensor 

(a)  A  circle  inside  a  square  . 

(b)  The  detection  motion  of  a  circle  and  a  square  . 

Fig.  3. 1  Car  in  initial  and  moved  position  . 

Fig.  3.2  Car  in  initial  and  moved  position  after  preprocessing . 

Fig.  3.3  Difference  Image . 

Fig.  3.4  Car  in  initial  and  moved  positions . 

Fig.  3.5  Image  after  preprocessing . 

Fig.  3.6  Difference  image  . 

Fig.  4. 1  The  nonuniform  logarithmic  spiral  sensor,  circular  elements . 

Fig.  4.2  Row  Search  Mechanism . 

Fig.  4.3  Mixed  density  Gaussian  grey  level  image . 

Fig.  4.4  Histogram  for  correlation  of  rows  k  and  k+i . 


LIST  OF  TABLES 


Table  3.1  Motion  to  the  left  and  down  . 

Table  3.2  Motion  to  the  right  and  down  . 

Table  4.1  Row  correlation  for  search  area,  a  Gaussian  image  . 

Table  4.11  A  summary  of  horizontal  and  vertical  perturbations  for  a 

Gaussian  image  . 

Table  4.in  More  simulations  for  a  Gaussian  image  . 

Table  4.rV  A  real  image  with  Ax=5.0,  Ay=-4.0,  AO- 1 1.0°,  <=.84 . 

Table  4.V(a)  First  iteration  Ax=5.0,  Ay=-4.0,  A0=1 1.0°,  <=.84 . 

Table  4.V(b)  First  iteration  Ax=5.0,  Ay=-4.0,  A0=1 1.0°,  <=.84 . 

Table  4.VI(a)  Second  iteration  Ax=5.0,  Ay=-4.0,  A0=1 1.0°,  <=.84 . 

Table  4.VI(b)  Second  iteration  Ax=5.0,  Ay=-4.0,  A0=11.0°,  <=.84 . 

Table  4.VII  A  summary  of  estimates  in  the  rectangular  plane  of  a  real  image 


1.  INTRODUCTION 


This  final  report  covers  research  done  on  "A  Sensor  with  Biological  Preprocessing 
Features"  under  the  sponsorship  of  the  Air  Force  Office  of  Scientific  Research,  covering 
the  period  February  1,  1986  to  September  30,  1987. 

As  a  result  of  the  effort,  four  students  have  obtained  the  Master  of  Science  degree  in 
Electrical  Engineering  and  one  student  the  Doctor  of  Philosophy  degree  in  Electrical 
Engineering  [1,2,3,4,5].  Eleven  papers  have  been  written,  five  of  which  have  been  pub¬ 
lished  [6,7,8,9,10],  one  is  an  invited  paper  to  be  published  shortly  [11],  one  will  be  pub¬ 
lished  shortly  [12],  and  the  rest  are  under  the  process  of  review  [13,14,15,16]. 

Research  results  obtained  through  January  1986  were  reported  in  the  Final  Report 
on  Grant  No.  AFOSR-84-0349,  "Biological  Visual  Systems  Structures  for  Machine 
Vision  Applied  to  Robotics",  Report  No.  UVAy525647/EE86/101,  February  1986  [41], 
A  brief  summary  complemented  with  results  not  reported  in  that  report  follows  in  Sec¬ 
tion  2.  The  main  body  of  this  final  report  contains  more  recent  results.  The  repon  is 
organized  in  the  order  listed  below.  For  clarity  and  convenience,  references  have  been 
listed  at  the  end  of  each  section. 

1.  Introduction 

2.  Review  of  Previous  Work 

3.  Long  Range  Qualitative  Motion  Detection  Algorithm 

4.  A  Fast  Algorithm  for  Motion  Prediction 

5.  Pattern  Recognition  Using  the  BVS,  the  C-transform  and  a  Neural  Network 
Classifier 

6.  Dual  Sensor  Implementation  and  Analysis 

7.  Depth  Computation  from  Optical  Flow 

8.  Vestibular-ocular  Motion  for  Target  Tracking 
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2.1  Biolugical  V  isual  Sensor  Configuration 

Three  basic  contigurauons  for  the  image  plane  were  considered:  arcs  of  ring  (called 
'e. ’angular  elementsT.  circular  elements  and  hexagonal  elements,  see  Fig.  2.1.  Design 
.'...C'-  rcidting  the  number  of  concenmc  rings  to  the  number  of  elements  per  ring  for  the 
t’  ree  conngurations  were  developed.  In  addition,  a  program  to  simulate  images  that 
A  '...d  be  obtained  with  these  structures,  using  real  images  obtained  with  a  conventional 
'  2  512  CCD  camera,  was  written  and  used  for  all  subsequent  work.  Based  on  the 

•;c.c  .'t  \;ew  covered  by  the  sensor  for  a  specific  optic  configuration,  the  number  of  pix- 
c.  -  .  t  the  log  sptral  sensor  is  orders  of  magnitude  smaller  than  that  of  a  conventional  rec¬ 
ta”  c-,.ar  sensor,  for  equivalent  pixel  size  in  the  fovea  region.  The  basic  properties  of  the 
it\  S  sensor  configuration,  such  as  invariance  to  scaling  and  rotation  about  the  optical 
a\:s  were  venfied  using  real  images. 

.\dditi  anil  work  not  included  in  the  repon  has  been  done  on  the  use  of  chain  coding 
ti'r  the  si.x  neighborhood  computation  plane  grid.  Several  binary  images  were  chain- 
oKied  for  comparison  with  eight  neighborhood  coding  of  conventional  images.  Due  to 
the  fact  that  the  six  neighborhood  coding  was  applied  in  the  computation  plane,  where 
.ntages  have  a  completely  different  shape  than  in  the  image  plane,  the  chain-length  was 
niM  as  shon  as  the  reduction  in  number  of  pixel  would  seem  to  indicate,  but  it  was 
significantly  shoner  than  that  of  the  conventional  sensor  images,  nevenheless.  In  addi¬ 
tion ,  the  process  is  faster  because  only  six  directions  are  used.  Once  the  images  were 
cnain  coded,  pattern  recognition  was  performed  on  them  using  chain-coded  template 
matching  which  is  more  convenient  than  other  methods  in  this  case.  For  example,  a  ctr- 
sde  has  only  one  chain  direction  and  chain  length,  irrespective  of  size,  when  it  is  centered 
on  the  optical  axis  and  the  code  does  not  change  for  other  centered  objects  either, 
irrespective  of  size  and  rotation. 

2.2.  Edge  and  Motion  Detection  with  the  New  Sensor 

This  pan  of  the  research  dealt  modelling  properties  of  the  human  peripheral  visual 
-system  iHPVS)  for  edge  and  motion  detection  and  its  application  to  the  new  sensor.  A 
good  summary  of  outstanding  research  results  by  other  investigators  in  the  area  of  model¬ 
ling  is  given  in  the  February  1986  report  and  in  [1]. 

Edge  detection  for  the  arcs-of-ring  sensor  can  be  implemented  in  the  computation- 
plane’s  rectangular  grid  using  any  of  the  standard  methods  such  as  local  neighborhood 
operator,  Laplacian-Gaussian  or  global  methods  related  to  signal  analysis  [2,3].  For  the 
circular  and  hexagonal  sensors,  the  computation-plane  array  is  staggered,  forming  a  hex¬ 
agonal  array  and  the  following  two  neighborhood  operators  can  be  used: 

=  la.-a,  I  Ijj-a,  I  la  |-a^l  +  ia  ,-a.,  I  (2.1) 

and 

These  are,  respectively,  the  absolute  value  and  the  Laplacian  methods.  The  abso¬ 
lute  value  method  does  not  produce  zero-crossings  and  is,  consequently,  less  similar  to 
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HPVS  models  than  methods  such  as  the  Laplacian-of-the-Gaussian.  In  the  Laplacian 
method  positive  and  negative  gray  level  values  around  the  regions  of  abrupt  intensity 
change  are  obtained,  and  the  zero-crossing  points  between  these  are  chosen  as  the  edges. 
In  order  to  eliminate  spurious  edges  in  noisy  images,  an  optional  thresholding  operator 
can  be  applied.  The  general  method  is  shown  in  block  diagram  in  Fig.  2.2.  Simulations 
with  synthetic  and  real  images  obtained  with  a  512  by  512  CCD  camera  and  transformed 
to  the  new  sensor  equivalent  image  by  the  technique  described  in  the  repon.  These  were 
performed  using  the  Laplacian-Gaussian  method  and  the  two  operators  of  formulas  (2.1) 
and  (2.2).  For  synthetic  images,  the  results  using  (2.1)  were  practically  identical  to  those 
of  the  Laplacian-Gaussian.  For  a  real  image,  a  noise  photograph  of  a  girl’s  face.  Fig. 
2.3(a),  the  results  for  (2.1)  and  the  L-G  are  given  in  Figs.  2.3(b)  and  (c),  respectively. 
Notice  that  the  edges  are  very  similar  in  both  cases.  These  results  correspond  to  a  low 
resolution  circular  sensor  of  36  rings,  75  elements  per  ring.  More  detailed  edges  are 
obtained  with  a  higher  resolution  simulated  sensor  of  51  rings,  120  elements  per  ring. 

For  motion  detection,  the  same  approach  was  used  as  for  edge  detection.  Biological 
motion  detection  was  first  reviewed  [1]  using  the  two  process  theory  of  HVS  and  then 
simulations  were  performed  using  the  T  and  U  channels  [4]  for  short-range  mode.  The 
procedure  for  simulation  was: 

a)  Perform  spatial  convolution  S  (jt  .y  j )  =  V^G*/  {x  .y ,/ ) 

b)  Establish  the  orientation  of  zero-crossing  contours 

c )  Apply  temporal  derivation  to  S {x ,y ,t ):  T(x,y.t)  =  dS {x ,y  ,t )/3t 

d)  To  detect  motion  at  each  zero-crossing,  a  positive  value  of  T  indicates  motion 
towards  the  negative  side  of  the  crossing;  a  negative  value  of  T  indicates  the  oppo¬ 
site.  The  direction  of  motion  is  perpendicular  to  the  edge. 

e)  All  the  unit  directional  vecton  in  an  edge  region  with  a  given  orientation  are  com¬ 
bined  into  a  single  unit  vector  pxjinting  in  that  direction.  After  this  operation  is  per¬ 
formed,  unit  directional  vectors  in  adjacent  regions  with  different  orientation  are 
combined  in  the  same  way,  proceeding  along  the  edge,  until  the  complete  closed 
edge  has  been  processed.  This  will  give  the  direction  of  motion.  The  resolution  is 
limited  to  45°  increment.  In  Fig.  2.4,  a  square  is  translated  at  an  angle  of  45°.  as 
indicated  by  the  arrow  at  the  center  of  the  square. 

The  simulation  in  short-range  mode  for  the  new  detector  is  related  to  intensity  based 
schemes  in  computer  vision.  The  simulation  procedure  was  as  follows: 

a)  Apply  the  six-neighborhood  Laplacian  operation  to  the  input  images. 

b)  Establish  the  orientation  of  zero-crossings  for  the  first  frame  and  apply  7-element 
chain  coding  to  the  zero-crossing  contour  to  segment  the  objects. 

c)  Perform  temporal  derivation  by  subtraction,  T{x,y  j)=  S{x ,v ./  +  A; ) - S (x ,> ./ ). 

d)  With  reference  to  Fig.  2.5,  in  which  positively  marked  pixels  correspond  to  high 
intensity  at  edge  boundary  and  negatively  marked  ones  to  low  intensity,  when  the 
edge  is  moving  towards  the  negative  side  as  shown  in  Figs.  2.5(a)  and  (d),  all  seven 
elements  will  become  positive  at  the  second  frame.  When  the  edge  is  moving 
towards  the  positive  side,  some  elements  may  remain  positive.  The  direction  of 
local  motion  is  perpendicular  to  edge  orientation  at  zero-crossing  and  the  displace¬ 
ment  is  limited  to  two  pixels. 
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e)  Combining  the  directions  of  local  motion  along  the  connected  zero-crossings  by 
vector  analysis,  the  complete  motion  is  derived.  The  results  were  all  satisfactory. 
.An  example  is  given  in  Fig.  2.6.  Note  that  the  examples  are  presented  in  the  image 
plane  for  easy  visualization. 
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3.  LONG  RANGE  QUALITATIVE  MOTION  DETECTION  ALGORITHM 

A  qualitative  algorithm  for  long-range  motion  detection  has  been  developed  based 
upon  heuristic  rules.  It  has  been  tested  with  both  synthetic  and  real  images  and  it  has 
produced  correct  results  in  all  instances.  The  assumptions  necessary  for  application  of 
the  algorithm  are  that  the  image  has  been  segmented  and  binarized  and  that  a  binary  edge 
is  available  in  the  computation  plane. 

Four  cases  were  considered  which  can  be  combined  to  produce  more  complex 
scenes: 

a)  Objects  enclosing  the  optical  axis. 

b)  Objects  "enclosing  the  optical  axis  slightly."  It  is  necessary  to  make  this  distinc¬ 
tion  between  cases  (a)  and  (c)  due  to  the  discrete  nature  of  the  process.  For  a  con¬ 
tinuous  edge,  only  (a)  and  (c)  are  necessary. 

c)  Objects  not  enclosing  the  optical  axis. 

d)  More  than  one  object,  of  any  type,  in  the  field  of  view. 

For  objects  enclosing  the  optical  axis,  the  qualitative  algorithm  is  based  on  having 
the  difference  image  between  the  perturbed  and  the  original  images  in  the  computation 
plane;  these  are  called  "current  image  (Cl)"  and  "previous  image  (PI),"  respectively. 

The  qualitative  algorithm  is  as  follows: 

Eight  (8)  position  counters  are  required  to  keep  track  of  relative  positions  of  points  in  the 
Cl  and  the  PI.  These  counters  are  called  cntcl ,  cntc2,  cntcS.  cntc4,  and  cntpl,  cntpl, 
cntpj,  cntp4  and  they  determine  which  points  in  each  of  the  frames  are  in  which  of  the 
four  quadrants.  Since  the  polar  angle  is  mapped  to  the  v-axis  in  the  complex  plane,  the 
four  quadrants  represent  four  distinct  regions  along  it.  The  four  regions  over  the  v- 
domain  are  shown  below.  They  are 

-rc/2  <  V  ^  rt/2  (i ) 

7t/2  <  V  <  371/2  {ii ) 


0  <  V  <  71 


{Hi ) 


7t  <  V  ^2n  (iv) 

The  first  two  regions  are  used  to  detect  horizontal  motion,  or  motion  along  the  x-axis, 
and  the  other  two  to  detect  vertical  motion,  or  motion  along  the  y-axis. 

Once  the  number  of  jjoints  in  each  region  has  been  obtained,  the  following  set  of 
rules  decides  the  direction  of  motion. 

If  there  are  more  Cl  points  in  Region>(i)  than  there  are  PI  points  in  Region-(ii)  and, 
at  the  same  time,  more  PI  points  in  Region-(i)  than  there  are  Cl  points  in  Region- 
(ii),  then  the  motion  is  in  the  +x  direction.  In  other  words,  if  the  above  critena  is 
true,  the  object  has  moved  to  the  right  of  its  original  position. 

In  a  similar  manner,  if  the  number  of  Cl  points  in  Region-(ii)  is  greater  than  the 
number  of  points  in  Region-(l),  and,  at  the  same  time,  the  number  of  PI  points  in 


II) 


Region<(ii)  is  greater  than  the  number  of  Cl  points  in  Region-(i),  the  motion  is  in 
the  -X  direction.  In  other  words,  the  object  has  shifted  to  the  left  of  its  onginal  posi¬ 
tion. 

An  object  is  selected  to  be  moving  up  from  its  original  position  if  there  are  more  Cl 
points  in  Region-(iii)  than  there  are  PI  points  in  Region-(iv)  and.  at  the  same  time, 
more  PI  points  in  Region-(iii)  than  there  are  Cl  points  in  Region-(iv). 

For  downward  modon,  the  number  of  Cl  points  in  Region-(iv)  has  to  be  greater  than 
the  number  of  PI  points  in  Region*(iii)  and,  simultaneously,  the  number  of  PI  points 
in  Region-(iv)  has  to  be  greater  than  the  number  of  Cl  points  which  fall  in  Region- 

(iii). 

The  verbal  description  of  the  detection  of  horizontal  and  vertical  motion  can  be 
confusing.  A  mathematical  formulation  describes  the  situation  more  succinctly.  Let  Np 

be  the  number  of  PI  points  in  Region-(i).  By  the  same  token,  N  ,  N  N  represent  the 

^2  ''2 

number  of  PI  points  in  Region-(ii),  and  the  number  of  Cl  points  in  Region-(i)  and 
Region-(ii)  respectively.  Then,  if 

N  >  N  and  N_  >  N_ 

^2 

motion  is  in  the  +x  direction.  Using  the  same  convention,  if 

N  >  N  and  N-  >  N 

'-2  r,  f-j  L,, 

motion  is  in  the  -x  direction. 

Defining  Np  .  Np  ,  .  N^  to  represent  the  number  of  PI  and  Cl  points  in  Region* 

(iii)  and  Region-(iv)  respectively,  if 

N„  >  N_  and  N_  >  N_ 

r,  L4 

motion  is  in  the  +y  or  upward  direction.  Similarly,  if 

N  >  Np  and  N  >  N 

'^4  "3  *^4  ^-3 

motion  is  in  the  -y  or  downward  direction. 

3.1.  Algorithm  Testing 

The  Qualitative  Algorithm  was  tested  to  have  performed  satisfactorily  with  both  line 
images,  and  synthetically  generated  two  dimensional  images.  One  of  the  major  assump¬ 
tions  made  was  that  an  edge  image  was  available  to  the  algonthm.  In  both  cases  certain 
aspects  of  the  images  were  ideal.  The  chief  of  these  idealizations  was  that  the  edges 
were  well  defined  and.  in  most  case,  one  pixel  width  thick.  Noise  was  introduced  in  the 
images  to  check  the  robustness  of  the  algorithms,  but  the  edges  of  the  objects  weie  still 
well  defined  because  of  the  low  content  of  the  noise.  In  other  words,  in  all  of  the  test 
cases,  the  signal-to-noise  ratio  of  the  information  content  of  the  image  was  extremely 
high. 
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The  algorithm  was  then  tested  with  real  images,  and  it  was  noted  that  the  edges 
were  not  so  well  defined.  In  order  to  remedy  this  situation,  a  very  high  contrast  back¬ 
ground  was  used.  This  provided  much  better  edge  information.  Once  the  edges  were 
obtained,  the  motion  of  the  object  was  computer  simulated.  Tests  with  actual  physical 
motion  of  the  objects  were  not  run. 

3.2.  Problems  With  Real  Images 

Real  images  do  not  display  the  ideal  properties  of  either  line  or  synthetically  gen¬ 
erated  images.  The  actual  information  in  the  images  is  quite  heavily  overlaid  by  noise 
and  any  attempt  made  to  filter  out  the  noise  resulted  also  in  loss  of  information  and  hence 
distortion  of  the  image.  Another  problem  encountered  was  that  of  the  background.  In 
the  images  used  earlier,  the  background  was  of  a  uniform  intensity  distribution.  This, 
then,  made  it  a  completely  negligible  entity  since  it  had  no  features  which  would  either 
add  to  or  reduce  the  complexity  of  the  preprocessing  algorithm.  The  background  plays 
an  imponant  pan  in  finding  the  difference  images  which  are  needed  to  find  the  direction 
of  morion  with  the  qualitative  algorithm.  .  - 

The  Gradient  operator  originally  used  to  detect  edges  was  highly  susceptible  to 
noise.  Hence,  a  new  operator,  viz.  the  Sobel  Operator,  was  utilized  to  perform  edge 
detection.  T  ugh  much  better  than  the  Gradient  Operator  as  far  as  noise  sensitivity  is 
concerned,  the  Sobel  Operator  produces  edges  which  are  on  average  three  pixels  wide. 
Since  one  of  the  ideal  conditions  assumed  was  that  the  width  of  the  edges  was  only  one 
pixel,  this  caused  a  serious  problem  in  the  application  of  the  algorithms.  The  images  thus 
have  to  be  heavily  preprocessed  in  order  for  them  to  be  usable  with  the  qualitative  algo¬ 
rithm. 

3.3.  Schemes  for  Preprocessing  Images 

A  number  of  schemes  were  attempted  to  solve  the  above  mentioned  problems.  The 
schemes  were  all  extremely  time  consuming.  A  discussion  of  such  problems  and  the 
schemes  used  to  counter  follows. 

The  first  scheme  was  used  to  eliminate  the  background  from  the  images.  A  rem¬ 
inder  here  that  the  qualitative  algorithm  uses  two  images,  the  second  taken  a  time  T  after 
the  first,  and  based  on  the  positional  information  derived  from  such  an  operation,  predicts 
the  direction  of  travel.  Since  the  background  was  common  to  the  two  images,  it  may  be 
wondered  why  it  has  to  be  eliminated  separately.  The  reason  is  that  a  difference  image 
of  the  two  frames  needs  to  be  formed.  If  the  background  is  not  eliminated,  it  is  difficult 
to  segment  the  object  from  the  background  after  the  difference  has  been  found.  It  was 
thus  easier  to  eliminate  the  background  from  the  images  before  computing  the  difference. 

The  following  sequence  was  followed.  The  first  step,  or  filter,  enhanced  the  edges 
in  the  images.  This  was  done  by  convolving  the  image  with  a  Sobel  Operator.  The 
second  step  was  to  binarize  the  image.  It  is  easiest  to  deal  with  binary  images  because 
segmentation  is  an  important  aspect  of  the  sequence.  The  images  thus  obtained  were 
then  stored  for  use  by  another  filter  which  subtracted  first  the  background  information 
from  both  frames  and  then  the  two  frames  from  each  other. 

Though  the  above  successfully  eliminated  the  problem  of  interference  of  the  back¬ 
ground,  one  problem  remained;  the  thickness  of  the  edges.  This  becomes  a  problem  in 
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terms  of  speed  of  computation  because  for  an  n  point  representation  pf  the  edge,  the  total 
number  of  operations  performed  by  the  algorithm  is  of  the  order  of  n*. 

Instead  of  using  edge  thinning  to  solve  this  problem,  another  less  time  consuming 
scheme  was  attempted  and  has  provided  quite  good  results.  This  scheme  introduces  two 
extra  steps  in  addition  to  the  sequence  mentioned  above.  After  binarizing  the  image,  a 
second  edge  enhancement  operation  is  performed  on  the  image,  this  time  using  a  gra¬ 
dient  operator.  The  gradient  operator  produces  edges  which  are  two  pixel  wide,  thus 
reducing  the  total  number  of  pixels  representing  the  edge. 

After  the  image  has  been  preprocessed,  the  two  images  are  subtracted  from  each 
other.  An  average  position  of  difference  is  calculated  and  the  edges  are  then  recorded 
with  respect  to  it.  The  average  position  of  difference  is  nothing  more  than  a  point 
represented  by  the  average  x-coordinate  and  the  average  y-coordinate  over  the  total 
number  of  points  of  same  intensity  on  the  two  frames.  This  allows  the  qualitative  algo¬ 
rithm  for  centred  figures  to  be  used  exclusively  with  all  images. 

It  should  be  mentioned  here  that  the  qualitative  algorithm  being  used  differs  slightly 
from  the  scheme  utilized  with  synthetic  images.  In  the  earlier  case,  an  edge  representa¬ 
tion  was  required,  and  the  number  of  current  image  and  previous  image  points  in  the 
difference  image  were  used  exclusively  to  calculate  the  direction  of  motion.  In  the 
scheme  used  here,  due  to  the  extremely  noisy  image,  it  is  not  feasible  to  trace  a  contour. 
Even  though  the  algorithm  used  earlier  did  not  require  continuous  edges-a  gap  of  up  to  5 
pixels  width  could  be  tolerated— it  was  discovered  that  it  fail^  to  perform  well  with 
images  obtained  from  the  camera.  Further  testing  is  being  carried  out  on  this  aspect  of 
the  algorithm. 

One  major  difference  between  the  present  use  of  the  qualitative  algorithm  and  the 
previous  use  is  the  following:  When  using  synthetic  or  line  images,  since  the  images 
underwent  a  transformation  with  a  computer  program,  the  number  of  points  which  consti¬ 
tuted  the  edges  remained  constant.  Hence  the  probability  of  incorrectly  selecting  a  direc¬ 
tion  of  motion  was  not  very  high  since  the  number  of  points  in  any  region  depended 
stnctly  on- the  mapping.  In  other  words,  since  the  qualitative  algorithm  uses  the  number 
of  pomts  in  the  four  quadrants  as  its  basis  for  determination,  the  inequalities  in  the 
number  of  points  in  the  various  regions  were  due  to  the  placement  of  the  same  number  of 
points  in  each  frame.  Preprocessing  affects  different  frames  differently  in  the  case  of  real 
images.  Thus,  the  number  of  physical  pixels  which  constitute  the  current  image  may  be 
greater  or  smaller  than  the  number  of  pixels  which  constitute  the  previous  edge  image. 
Fortunately,  The  difference  is  negligible.  Since  the  typical  number  of  points  is  of  the 
order  of  10,000,  a  difference  of  a  100  pixels  or  so  does  not  constimte  a  serious  error. 

3.4.  Examples  and  Further  Discussion 

In  the  research  conducted  on  real  images  it  was  found  that  the  above  aspect  of  the 
qualitative  algorithm  was  unimportant.  The  new  algorithms  were  tested  with  3  real 
images.  The  test  objects  used  were  model  automobiles. 

The  results  were  very  encouraging.  It  was  found  that  even  though  the  pre-processed 
images  contained  all  the  problems  mentioned  above,  the  qualitative  algorithm  correctly 
predicted  the  direction  of  motion  in  each  case.  The  test  results  arc  shown  below. 
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In  Fig.  3.1,  a  car  is  shown  in  its  initial  and  moved  positions.  The  movement  is  to 
the  left  and.  since  the  car  rests  on  a  slight  incline,  down.  The  images  are  then  prepro¬ 
cessed  in  the  sequence  described  earler.  A  second  filter  is  then  used  to  subtract  the  back¬ 
ground  from  the  images  and  then  the  images  from  each  other.  The  first  step  is  shown  in 
Fig.  3.2,  and  the  difference  image  shown  in  Fig.  3.3.  Note  the  straight  line  in  Fig.  3.3 
div  iding  the  image  into  two  halves.  It  is  in  reference  to  this  line  that  the  position  of  the 
objects  is  calculated.  The  equation  of  this  line  is  given  by; 

X  =  average  -x  -coordinate 

.As  noted  earlier,  both  the  x-coordinate  and  the  y-coordinate  of  the  average  position  are 
needed.  Only  one  has  been  shown  here  for  sake  of  simplicity.  The  position  information 
of  the  points  which  constitute  the  current  (Cl)  and  the  previous  (PI)  frames  was  previ¬ 
ously  stored  so  the  new  set  with  respect  to  the  average  position  of  difference  is  easily 
obtained.  This  data  is  then  presented  to  the  direction  determining  program  which  first 
performs  the  complex  logarithmic  conformal  mapping  on  the  set  of  data  points  and  then 
decides  upon  a  direction  of  motion  based  on  where  the  points  fall  in  the  complex  plane. 
Referring  back  to  the  qualitative  algorithm,  and  comparing  it  against  Table  3.1,  we  note 
that  the  conditions  for  motion  to  the  left  and  down  are  met.  Hence  the  algorithm  decides 
down  and  left  as  the  directions  of  motion.  A  second  example  is  shown  in  Figures  3.4,  3.5 
and  3.6.  The  data  is  once  again  tabulated  and  is  shown  in  Table  3.2.  As  can  be  seen 
from  the  data  and  the  above  relations,  the  motion  of  the  car  is  to  its  right  and  in  a  down¬ 
ward  direction. 

Once  again  it  is  emphasized  that  m  the  new  use  of  the  algorithm  the  only  use  we 
have  of  the  intersection,  or  difference,  image  is  to  locate  a  central  point.  The  igorithm 
performs  its  operation  on  the  two  preprocessed  images. 

Multiple  Objects  In  the  earlier  work  with  line  images,  it  was  shown  that  the  qualitative 
algorithm  could  handle  more  than  one  object  in  the  field  of  view.  The  objects  were  not 
bound  to  each  other  in  any  sense.  In  other  words,  the  different  objects  could  undergo  dif¬ 
ferent  transformations  and  the  qualitative  algorithm  could  detect  the  objects  and  then 


Region 

Number  of  Points 

Previous 

Frame 

Current 

Frame 

(i) 

1425 

2173 

(ii) 

3181 

2433 

(iii) 

1299 

3307 

(iv) 

1526 

3080 

Table  3.1  Motion  to  the  left  and  down 


Figure  3.4  Ca 


urc  3.5  liiiciv?i’  ill  U  r  pi iiiroi issi 


References 


;o 


1.  Gilad  Adiv,  "Determining  the  Three-Dimensional  Motion  and  Structure  from  Opti¬ 
cal  Row  Generated  by  Several  Moving  Objects."  IEEE  Transactions  on  Pattern 
Analysis  and  Machine  Intelligence,  vol.  PAMI-7,  July  1985,  pp.  384-401. 

2.  B.  F.  Buxton  and  H.  Buxton,  "Computation  of  Optic  Flow  from  the  Motion  of  Edge 
Features  in  the  Image  Sequences,"  Image  &  Vision  Comput.,  vol  2,  May  1984,  pp. 
59-75. 

3.  R.  Jain,  "Extraction  of  Motion  Information  from  Peripheral  Processes,"  IEEE  Tran¬ 
sactions  on  Pattern  Analysis  and  Machine  Intelligence,  vol.  PAMI-3,  No  5,  Sep¬ 
tember  1981,  pp.  489-503. 

4.  Ramesh  Jain,  Sandra  L.  Banlett  and  Nancy  O’Brien,  "Motion  Stereo  Using  Ego- 
Motion  Complex  Logarithmic  Mapping,"  IEEE  Proceedings,  1986,  pp.  188-193. 

5.  Ramesh  C.  Jain,  "Segmentation  of  Frame  Sequences  Obtained  by  a  Moving 
Observer,"  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence,  vol. 
PAMI-6,  September  1984,  pp.  624-629 

6.  K.  Prazdny,  "Determining  the  Instantaneous  Direction  of  Motion  from  Optical  Flow 
Generated  by  a  Curvilinearly  Moving  Observer,”  Computer  Graphics  and  Image 
Processing,  vol.  17,  November  1981,  pp.  238-48. 

7.  K.  Prazdny,  "On  the  Information  in  Optical  Flow,"  Computer  Vision,  Graphics  and 
Image  Processing,  vol.  22,  May  1983,  pp.  239-59. 


21 


4.  A  FAST  ALGORITHM  FOR  MOTION  PREDICTION 


4.1  Introduction 

A  sensor  for  machine  vision  with  biological  visual  features  has  been  studied  by 
several  researchers  [1,2, 3, 4]  based  on  Schultze’s  model  of  the  retina  [5].  According  to 
this  model,  the  light  sensitive  array  of  elements  equivalent  to  the  retina,  called  the 
“image  plane”  has  receptive  fields,  or  pixels,  of  increasing  size  towards  the  periphery, 
except  for  a  small  region  at  the  center,  the  fovea,  at  which  pixels  are  of  equal  size  and 
uniform  distribution.  The  fovea  is  the  region  of  highest  visual  acuity.  Outside  the  fovea, 
the  pixels  are  distributed  in  rings  whose  radius  increases  in  size  exponentially.  Each  ring 
contains  the  same  number  of  pixels,  all  of  the  same  size  in  a  given  ring.  This  pixel 
configuration  in  image  plane  is  mapped  to  the  “computation  plane”,  equivalent  to  the 
visual  cortex  by  a  logarithmic  conformal  transformation.  Figure  4.1  illustrates  the  sensor 
configuration  both  in  image  plane  and  computation  plane.  The  transformation  holds  for 
all  points  outside  the  fovea  and  it  is  easy  to  see  that  objects  scaled  or  rotated  about  the 
optical  axis  in  image  plane  remain  invariant  in  shape  in  computation  plane,  with  shifts 
along  the  u  and  v  axis,  respectively,  for  scaling  and  rotation.  These  properties  represent 
important  advantages  for  image  processing  [6,7].  Examples  of  these  properties  can  be 
found  in  [4]. 

When  objects  are  translated  in  the  image  plane,  however,  the  transformation  pro¬ 
duces  distorted  images  in  computation  plane  for  which  it  is  extremely  difficult  to  perform 
motion  prediction  because,  in  effect,  a  simple  translation  in  image  plane  corresponds  to 
nonrigid  motion  in  computation  plane. 

The  receptive  fields  in  a  biological  visual  sensor  (BVS)  are  formed  by  grouping 
large  numbers  of  elementary  sensors  (cones  and  rods).  There  seems  to  be  a  one  to  one 
correspondence  between  receptive  fields  and  elementary  sensors  in  the  fovea  region 
[5,8].  Outside  the  fovea,  translation  on  the  image  plane  can  be  analyzed  in  a  simpler  way 
if  elementary  sensors  are  grouped  to  form  essentially  a  rectangular  grid.  Depending  on 
the  visual  tasks  to  be  performed,  different  areas  in  the  visual  cortex  are  used.  The  visual 
cortex  in  primates  consists  of  approximately  ten  separate  areas  which  are  functionally 
distinct,  but  complexly  interrelated  [8]  with  functions  not  yet  understood.  In  a  vision  sys¬ 
tem  emulating  some  of  the  features  of  biological  systems,  grouping  elementary  sensors  in 
a  rectangular  grid  to  perform  tasks  such  as  tracking,  is  thus  justified.  If  a  single  rectangu¬ 
lar  sensor  is  used,  the  non-uniform  exponential  configuration  (or  “logspiral”)  is  obtained 
from  the  sensor  by  means  of  a  logic  circuit  [17]. 

In  this  section,  an  original  one  dimensional  correlation  tracker  for  motion  prediction 
is  developed  and  used  in  both  configurations.  Two-D  motion  prediction  is  performed  in 
the  rectangular  tesellation  image  plane  and  the  rotation  and  scaling  motion  is  performed 
in  logspiral’s  computation  plane. 


4.2.  The  Tracking  Algorithm 

The  most  relevant  characteristic  of  the  algorithm  to  be  described  below  is  that  the 
motion  prediction  problem  is  solved  without  using  correspondence.  The  point-to-point 
correspondence  (defined  as  :  “what  point  in  the  new  frame  corresponds  to  a  given  point 
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in  the  previous  frame.”)  problem  is  considered  by  many  researchers  to  be  very  difficult, 
or  even  unsolvable  in  practical  cases  (9.10.11].  Several  approaches  have  been  proposed 
to  solve  the  motion  prediction  problem  without  using  correspondence  [12,13].  The 
approach  proposed  in  this  paper  is  different  from  those  and  produces  good  experimental 
results  with  relatively  few  computations.  Translational  motion  prediction  could  be  used, 
in  conjunction  with  the  rotation  and  magnification  properties  of  the  logspiral  sensor 
configuration,  to  predict  motion  in  three  dimensions. 

Consider  a  dynamic  scene.  In  general,  the  intensity  of  the  light  reflected  by  the 
scene  will  be  a  function  of  location  and  lime,  /(jc.y.O-  We  can  define  its  gradient  in  this 
3D  space, 

T 

f  3/  3/  3/^ 

W= - ,  (4.1) 

I  3ac  dy  dr  J 


and  its  gradient  in  2D  geometric  space. 


V./ 


r  3/  3/1 
,  3x  dy. 


(4.2) 


Let  «■  be  an  arbitrary  unit  vector  in  (x,y,r)  space.  Then  the  directional  derivative  of  /(x.r) 
in  the  direction  of  u  is 

dl 

—  =  uVl  (4.3) 

ds 

where 

ds  =  dxi  dyj  +  dik 


For  uniform  illumination,  changes  in  intensity  at  a  point  are  due  to  object  motion 
(assume  that  other  disturbances  are  inhibited).  If  the  intensity  of  a  point  in  the  object  does 
not  change  with  respect  to  s  (in  other  words,  an  infinitesimal  spatial  displacement 

dJ 

corresponds  to  a  change  dr  in  time  between  consecutive  frames),  then  —  =0.  Let  the  unit 


vector  u  be  given  by 


ds 


u  =  c(v*(*.r).l) 


(4.4) 


2  2 

where  c  =  ( iv  i  -  1)  and  V(x_,t)  is  the  point  velocity.  Then,  combining  (4.3)  and  (4.4)  we 
have 

3/ 

+  —  =  0  (4.5) 

dr 


Eq.  (4.5)  is  known  as  the  constraint  equation  relating  the  spatial  gradient  to  the  temporal 


derivative  tor  a  moving  object  [14),  Let  t  be  the  time  between  two  successive  frame' 
:  =  -  r .  VVe  then  have 


A(x ./ )  =  iv*(  t  .n 


=  (At  Av 


3/  l(x  .1)  - 1  .t  -  z) 


The  approximation  of  (-1.7)  is  accurate  as  long  as  the  object  motion  is  relatively  small 

dl 

Using  finite  increments  in  (4.5)  and  replacing  7(x.t\  from  (4  6)  and  —  from  i4.7i. 

f)/  3/ 

At  Ay  *  Ux  .n  -  I  (x  ,t  -  T )  =  0 
dx  dv 


dl  dl 

/(t./,)  =  /(t,/,)+  At  Ay  (4.8) 

In  order  to  deal  with  large  object  motion,  a  recursive  process  is  essential.  Thus,  to  obtain 
an  Iterative  algorithm,  we  proceed  as  follows; 

Let’s  consider  onlv  one-dimensional  motion  for  the  moment.  Rewrite  (4  8)  as 


.(|)  -  /(t,/,)| 


df  (x .(,) 


where 


(RJ  =  Sampled'  ! {i^.t .)[ 

(T  }  =  Sampled  /  ( t , 

/  r  =« 


and  t,  is  an  initial  pixel  position  in  the  first  frame,  t^  is  an  arbitrary  chosen  pixel  position 
in  the  second  frame.  Normally,  we  will  choose  t,  to  be  equal  to  t  unless  a  prion  infor¬ 
mation  about  x^  IS  provided.  Thus,  for  any  pixel  and  for  a  discrete  ca.se,  we  have 


(x  ,/,) 


^  r  .  -  T 


Now.  define  the  correlation  of  {T^j  and  {R^}  as 


(4.11) 


Combine  (4.ioi  and  (4. in  together  and  for  Ax_=Ax^  for  all  y, yield 

m  m 

./=l  ;=1 


Similarly,  for  Ay^  expression,  we  have 


,'=1  1=1 


where 


{R^}  =  Sampledl  /O',,  4,)! 


(4.13) 


{T  J  =  Sampled]  l(y.i^) 

^  w  *■  >  *> 


In  other  words,  Av.  is  estimated  in  a  column  direction,  and  >  are  as  before.  To  start  the 
algorithm  above,  we  let 

Axq  =  0 

(4.141 

*c  “  S-;  -^.-1 

V  =  V  .  +  Av  , 

•  -  J - »  •  « -I 


and  R  is  evaluated  for  the  next  iteration  as 


R  :  =  Sampled  /(t.v.r.i 


(4.15) 


-■  **  * 


(4.16) 


The  Iteration  process  stops  when  the  following  conditions  occur 


or 


Ai  =  0. 

I 

Av.  =  0. 


(4.17) 


After  stopping  the  iteration  process,  the  final  estimated  expressions  are 

A 

Ax  =  XAt, 

A 

A)’  =  SA)-, 

1=1 


(4.18) 


where  n  is  the  number  of  iterations.  Equation  (4.18)  gives  us  the  estimates  within  a  frac¬ 
tion  of  pixel  accuracy.  The  final  estimates  can  also  be  obtained  equivalently  from  the  fol¬ 
lowing  equations. 


Ax  =  X,  -  Xq 

Ay  =  >,  -  yo 


(4.19) 


These  expressions,  however,  can  only  give  us  the  estimates  within  an  integer  value. 

The  algorithm  developed  above  is  a  spatio-temporal  type  algorithm  (gradient  algo¬ 
rithm)  which  is  very  attractive  as  far  as  hardware  implementation  is  concerned  [15,16]. 
The  algorithm  utilizes  only  a  point-by-point  basis  (i.e.;  a  single-degree-of-freedom  corre¬ 
lation)  as  opposed  to  conventional  correlation.  Thus,  computation  time  is  significantly 
reduced.  But,  perhaps,  the  most  important  feature  is  its  ease  of  implementation  in  a 
parallel  fashion.  Horizontal  and  vertical  perturbations  can  be  computed  independently 
even  if  the  object  experiences  2-D  motion.  The  reason  for  this  is  that  (4.12)  and  (4.13)  are 
based  on  the  individual  row  or  column.  Thus,  rows  and  columns  can  be  processed  simul¬ 
taneously.  Thus,  for  example,  an  image  of  size  64x64  pixels  will  need  2x/x64  cpu  seconds 
to  compute  the  perturbations,  whereas  for  the  parallel  processors  with  128  processors,  the 
time  required  is  t  seconds,  where  t  is  the  processing  time  to  compute  the  estimate  in  a 
single  row.  In  addition,  the  algorithm  requires  computation  of  gradients.  This,  in  general, 
causes  a  serious  problem  when  the  image  is  severely  degraded  by  noise.  In  the  real  image 
simulation,  we  will  demonstrate  empirically  that  the  algorithm  above  possesses  good 
noise  immunitv  and  the  computation  of  the  gradients  does  not  affect  the  estimates 


4.2.1.  2D  Motion  (a  general  case) 

The  correlation  tracker  has  been  derived  considering  motion  in  the  x  and  the  v 
directions  independently,  and  although  it  can  be  applied  to  motion  in  the  x  and  v  direc¬ 
tions,  in  general  it  will  not  produce  good  results  for  motion  in  both  directions  simultane¬ 
ously.  The  problem  for  general  x/y  translation  is  that  corresponding  rows  (and  columns) 
in  frames  i  and  ;+/  within  the  moving  area  will  be  shifted  with  respect  to  each  other.  For 
example,  if  Ax  =  3  pixels  and  Ay  =  5  pixels,  the  object  has  moved  right  by  3  pixels  and  up 
by  5  pi.xels  and  rows  in  frame  i-t-l  will  be  at  location  i+S  with  respect  to  corresponding 
rows  in  frame  i  (where  k  is  row  number).  Hence,  if  a  search  is  performed  for  each  row  in 
both  the  positive  and  the  negative  directions  to  find  the  best  matching  row  in  the  second 
frame,  this  problem  can  be  solved.  The  question  remains;  how  many  rows  (and  columns) 
must  compose  the  search  area?  This,  of  course,  will  depend  on  how  large  a  motion  will 
occur  between  frames  which,  in  mm,  depends  on  sampling  rate  and  object  speed.  A  rea¬ 
sonable  figure  is  to  allow  for  displacements  of  at  most  ten  percent  of  the  maximum  object 
dimension  in  pixels,  in  both  the  x  and  the  y  directions.  The  number  of  search  rows  will 
then  be  twice  the  maximum  object  dimension  plus  one,  in  pixels  units. 

The  row  under  investigation  is  correlated  with  all  the  rows  in  the  search  region  and 
a  pixel  by  pixel  estimate  is  performed.  The  row  which  produces  the  most  consistent  esti¬ 
mate  is  likely  to  be  the  matching  one.  This  can  be  best  illustrated  by  Fig.  4.2,  where  we 
search  3  rows  above  and  3  rows  below  the  kih  row  in  the  first  frame.  If  the  object,  for 
example,  moves  up  2  pixels,  then  the  (k-2)th  row  in  the  second  frame  is  perfecdy 
matched  to  the  row  kth  in  the  first  frame.  That  is,  the  estimate  obtained  from  correlating 
these  two  rows  using  (4.12)  is  a  consistent  estimate.  Others  will  produce  inconsistent  esti¬ 
mates.  This  will  become  clearer  in  the  simulation  on  a  Gaussian  image.  It  is  also  possi¬ 
ble  that  the  first  iteration,  although  producing  the  most  consistent  result,  will  not  suffice, 
so  the  search  region  is  decreased  to  a  fraction  of  its  previous  size  and  a  second  iteration  is 
performed.  After  the  first  iteration,  in  order  to  determine  which  is  the  row  with  the  most 
consistent  estimation,  a  histogram  of  pixel  estimates  is  generated  for  each  row  (or 
column).  The  histogram  with  the  smallest  variation  is  chosen  as  the  matching  one.  After 
this  first  iteration,  the  average  estimate  for  that  row  is  chosen  (nearest  integer)  as  the  first 
displacement  estimate  and  the  reference  image  (first  frame)  is  moved  by  that  number  of 
pixels  in  the  estimated  direction  of  motion.  A  new  iteration  is  now  performed  using 
(4.12)  and  the  process  is  repeated.  As  mentioned  above,  there  may  be  cases  in  which  the 
original  best  matching  row  is  not  the  same  for  the  second  iteration.  When  the  displace¬ 
ment  estimate  is  zero  or  a  fraction  of  a  row  less  than  one  half,  the  iteration  process  is 
stopped.  All  estimated  values  are  then  added  to  determine  the  total  displacement  esti¬ 
mate. 

4.2.2.  Simulation  with  a  Gaussian  Grey  Level  Image 

The  algorithm  was  tested  with  a  bi-valued  Gaussian  function  of  the  form 

p{x,y)=  lO'[P^p^(x,y)  +  P^^{x,y)] 

with 
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Figure  4.2  Row  search  mechanism 
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The  array  size  is  64x64,  -20  ^  x  ^  43,  -25  ^  y  <  38.  Figure  4.3  shows  the  grey  level 
im.age. 

The  synthetic  image  was  displaced  by  Ar=3,  Ay  =3  and  a  row  by  row  correlation  was 
performed  using  (4.12)  with  a  search  area  of  nine  rows.  The  algorithm  is  applied  only  to 
the  minimum  rectangle  enclosing  the  object  (tracking  window)  to  be  tracked,  hence  we 
are  assuming  that  segmentation  has  already  been  performed  offline.  This  needs  to  be 
done  only  at  the  start  of  the  operation.  In  other  words,  a  priori  information  about  target 
characteristics  such  as  its  size  and  its  location  with  respect  to  the  reference  frame  need  to 
be  known.  The  average  displacement  estimate  for  rows  in  the  region  containing  the 
image  when  correlated  with  rows  displaced  from  k-4  to  k+4  is  shown  in  Table  4.1.  The 
corresponding  histograms,  i.e.,  the  number  of  occurrences  vs  estimated  displacement  for 
rows  k  and  k-t-i,  i=-4  to  4,  are  shown  in  Fig.  4.4.  Comparing  the  table  with  the  histogram, 
a  best  match  is  obtained  for  a  row  shift  equal  to  three,  with  a  Ax  =  2.  In  this  case,  the 
matching  rows  in  the  second  frame  produce  consistent  estimates  ^f  Ax.  Other  row  shifts 
give  estimates  of  Ax  which  vary  widely  in  both  magnitude  and  direction.  The  original 
image  is  now  shifted  by  2  pixels  to  the  right  and  a  new  iteration  is  performed.  A  similar 


procedure  is  applied  to  columns  alternatively  (or  in  the  case  of  parallel  processing,  simul¬ 
taneously).  This  second  iteration  produces  a  displacement  estimate  of  Ax  =  l.OO.  The 
process  is  repeated  and  the  next  iteration  results  in  Ax=0.00.  thus  the  iterative  procedure  is 
ended  and  the  estimated  displacement  in  the  x  direction,  according  to  (4.18)  is  Ax  =2.93930 
units  or,  in  integer  pixels,  3  pixels. 

The  procedure  is  applied  simultaneously  to  rows  and  columns,  i.e.,  after  the  first  row  itera¬ 
tion.  the  first  column  iteration  is  performed,  etc.  Table  4. II  indicates  the  results  for  x  and  > 

Table  4.1  Row  correlation  for  search  area,  a  Gaussian  image 


Correlation  With  Row 


row  k 

k-4 

k-3 

k-2 

k-1 

k 

k+\ 

k-^2 

k-(-3 

k-(-4 

31 

-1229.91 

-330.37 

-119.03 

-56.27 

-42.52 

170.40 

5.49 

1.90 

2.15 

32 

-394.03 

-126.39 

-50.32 

-26.25 

-19.56 

70.79 

2.61 

1.90 

3.29 

33 

-132.70 

-46.85 

-20.10 

-9.92 

-5.68 

3.57 

0.23 

1.90 

4.66 

34 

-43.65 

-16.06 

-6.02 

-1.17 

3.67 

-23.65 

-1.71 

1.90 

6.34 

35 

-13.21 

-3.52 

0.89 

4.42 

12.47 

-23.96 

-3.05 

1.90 

8.91 

36 

-1.16 

3.67 

7.11 

12.89 

67.83 

-13.42 

-2.98 

1.90 

12.48 

37 

301.63 

-188.25 

-51.52 

-20.50 

-8.72 

-2.67 

1.11 

1.86 

-17.34 

38 

-5.53 

-4,52 

-2.84 

-0.35 

3.32 

7.80 

9.38 

1.83 

-7.96 

39 

-1.27 

0.37 

3.17 

8.04 

15.68 

21.39 

13,88 

1,83 

-5.14 

40 

2.71 

6.10 

12.52 

24.18 

37.39 

31.69 

13.52 

1  83 

-3.77 

41 

9.26 

17.43 

33.79 

58.13 

59.03 

31.46 

11,55 

1.83 

-2.67 

42 

22.41 

43.81 

83.73 

102.40 

60.65 

26.03 

9,47 

1.83 

-1.57 

i  43 

53.39 

113.37 

174.90 

112.50 

48.33 

19.82 

7.31 

1.83 

-0.43 

!  44 

145.77 

321.73 

232.65 

87.07 

34.20 

13.76 

5.32 

1.83 

0.79 

'  45 

925.65 

1216.69 

191.13 

57,86 

21.44 

8.49 

3.42 

1.83 

2.04 

'  46 

-443.61 

12935.43 

115.41 

31.20 

10.58 

3.66 

1.62 

1.83 

3.35 

47 

-171.40 

-1654.64 

51.91 

10.39 

1.28 

-0.68 

-0,09 

1,83 

4.78 

48 

-68.31 

-712.92 

1.15 

-7.25 

-6.84 

-4.70 

-1.76 

1.83 

6.25 

49 

-0.09 

692.00 

-38.54 

-21.48 

-13.90 

-8.27 

-3.32 

1.83 

7.76 

50 

47.63 

880.75 

-70.04 

-33.86 

-19.93 

-11.60 

-4.82 

1.83 

9.46 

:  51 

84.12 

7170.58 

-91.99 

^2.33 

-24.92 

-14.47 

-6,17 

1.83 

11.11 

52 

107.24 

1975.81 

-111.11 

-51.17 

-29.73 

-17.15 

-7.66 

1.83 

12.92 

53 

123.95 

8112.47 

-124.42 

-57.23 

-33.06 

-19.67 

-8,82 

1.83 

14.73 

54 

132.00 

1557.10 

-147.05 

-63.92 

-37.97 

-22.17 

-10.26 

1.83 

16.50 

55 

137.75 

1363.57 

-151.96 

-69  80 

-40.18 

-24.09 

-11.11 

1.84 

19.00 

56 

140.48 

1465.12 

-169.98 

-73.99 

^3.33 

-25.88 

-12.37 

1.89 

20.67 

57 

190.06 

-1307.00 

-129.74 

-66.07 

-39.85 

-24.97 

-12.15 

1.91 

22.781 

58 

137.04 

817.50 

-210.38 

-79,37 

-52.29 

-29.95 

-15.60 

2.10 

22.71 

59 

131.95 

1930.00 

-139.89 

-64.17 

-42.60 

-29.71 

-13,40 

3.00 

30.5d 

i-v-Vy 


estimates. 


Due  to  the  low  resolution  used  for  the  sensor  in  this  example,  there  are  situations  in  which 
the  estimated  displacement  contains  significant  error.  Table  4. Ill  summarizes  the  results  of 
several  displacements  computed  by  either  (4.18)  or  (4.19).  Notice  that,  in  general,  the  results 
obtained  from  (4.19)  are  better,  although  when  they  are  rounded  up  to  an  integer  number  of  pixels 
(as  for  example  in  the  last  case,  Aj:=— l,  Ay=^l)  there  can  still  be  an  error.  This,  however,  lies 
within  the  quantization  error  of  ±'A  pi.xel  and.  if  the  motion  is  the  same  but  the  sensor  resolution 
IS  increased,  the  results  are  better  (see  [17]  for  a  teal  image  2D  translation  simulation). 

4 J.  TRANSLATION,  ROTATION,  AND  SCALING 

To  track  3D  motion,  the  two  planes  (rectangular  and  logspiral’s  computation)  wiU  be  used 
simultaneously.  Dunng  a  processing  period,  information  from  each  plane  will  be  passed  back  and 
forth.  This  is  important  due  to  the  fact  that  if  a  logspiral  sensor  (image  plane)  is  used,  translation 
in  image  plane  produces  a  distorted  image  in  computation  plane.  On  the  other  hand,  rotation 
about  and  traitslation  along  the  optical  axis  are  dealt  with  easily  using  the  logspiral  sensor  (com¬ 
putation  plane),  but  are  difficult  to  deal  with  in  the  rectangular  (image  plane)  sensor.  Thus,  to 
accomplish  the  task,  the  logspiral  sensor’s  computation  plane  and  the  rectangular  sensor’s  image 
plane  will  have  to  interact.  There  is  useful  information  in  both  of  them,  depending  on  the  type  of 
motion,  that  can  be  used  to  optimize  the  process.  Notice  that  rotation  and  scaling  considered  here 
are  both  with  respect  to  the  optical  axis.  Thus,  no  rotation  of  the  image  plane  on  other  axes  is 
allowed. 

Let's  consider  a  combination  of  translation  and  rotation  for  the  moment  If  we  can  arrange 
a  group  of  processors  in  such  a  way  that  each  of  them  is  responsible  for  a  different  object  orienta¬ 
tion.  then  it  may  be  possible  to  estimate  i  and  y  perturbations  plus  rotahon  with  respect  to  the 
optical  axis  simultaneously.  This  arrangement  is  analogous  to  a  model  proposed  for  the  visual 

Table  4, II  A  summary  of  horizontal  and  vertical  perturbations  for  a  Gaussian  unage 


‘  Displacement  At  =  3.0.  Ay  =  3.0 

J 


Iteration 

1 

1 

1 

1.93930  1 

1.99919 

2 

1.00000  : 

t  1.00000 

3 

1 _ 

0.00000  j 

0.00000  j 

1  1 

1 _ ; 

3 

1 


hx  =  =  2.93930 


Ay  =  Y^y,  =  2.99919 


k  •  •  •  * 

K'N'!- 

m 


>C'  *  * 


H."  r ■  r ■  ^  V'  ■,**  W 


Table  4. Ill  More  simulations  for  a  Gaussian  image 
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cortex  [18].  That  is,  we  will  use  the  tracking  windows  with  different  orientations.  These  win¬ 
dows  will  be  used  by  each  processor  which  in  turn  computes  the  translational  parameters.  We 
then  look  at  the  most  consistent  estimates.  In  other  words,  we  will  use  the  same  procedure  as 
before  except  that  we  now  apply  the  algorithm  to  various  orientations  instead  of  one.  The  con¬ 
sistent  estimates  should  occur  when  the  reference  and  the  target  have  the  same  orientation  as  well 
as  when  its  rows  and  its  columns  are  perfectly  matched.  This  technique  can  be  extended  to  case 
of  translarion,  rotation,  and  scaling.  All  we  have  to  do  is,  at  a  given  orientation,  is  to  generate 
various  tracking  windows  each  with  a  different  scaling  factor.  Thus,  a  processor  having  a  tracking 
window  which  best  matches  the  target  both  in  orientation  and  scaling  will  give  the  highest 
response.  In  other  words,  it  gives  the  most  consistent  estimates.  Other  processors  will  also 
respond  but  their  estimates  will  not  be  as  consistent  as  the  one  having  the  best  match  tracking 
window.  This  is  where  a  similarity  between  this  technique  and  the  model  in  [18]  appears.  Notice 
that  this  is  not  the  same  as  the  conventional  correlation.  What  we  intend  to  do  here  is  to 
employ  a  one-dimensional  correlation  algorithm  and  implement  it  with  a  parallel  architecture. 


The  following  strategy  is  proposed  for  the  two  planes  interaction. 


I.  Logspiral  Plane  (Computation  Plane) 

1.  Apply  the  algorithm  previously  developed  to  both  angle  (column)  and  radius  (rowi 
direction. 

a) .  If  the  estimates  are  not  consistent  within  some  degree,  then  some  kind  of  motion  must 

have  occurred.  Signal  the  rectangular  plane  to  start  operation. 

b) .  If  the  estimates  are  consistent  within  some  degree,  then  the  object  experiences  onlv  ro¬ 

tation  or  scaling  or  both.  Thus,  there  is  no  2D  iranslauon  involved.  .No  operauon  is 
necessary  in  the  rectangular  plane. 

c) .  If  the  estimates  are  zero,  then  nothing  has  happened.  That  is.  no  motion  whatsoever  has 

taken  place. 

2.  Receive  a  start  signal  from  the  rectangular  plane.  Then,  apply  the  same  procedure  as 
used  in  step  1 . 

3.  Repeat  step  2  until  the  algorithm  converges. 


II.  Rectangular  Plane  (Image  Plane) 

1.  Receive  a  start  signal  from  the  logspiral  plane.  Then,  take  the  following  actions: 

a) .  Calculate  m  and  .sy  using  the  same  procedure  as  before  using  some  tracking  windows 

at  a  relatively  low  “resolution”  or  using  the  windows  at  the  onenution  and  scaling 
suggested  by  the  computation  plane  to  narrow  down  the  region  of  operation. 

b) .  Look  at  the  most  consistent  estimates  which  are  the  best  estimates  at  this  point. 

c) .  Update  the  original  reference  according  to  the  best  ax  and  av 

d) .  Remap  the  reference  via  logarithmic  transformation. 

e) .  Signal  the  logspiral  plane  to  stan  the  operation. 

2.  Receive  a  start  signal  from  the  logspiral  plane.  Then,  take  the  following  actions: 

a) .  Generate  a  finer  window  “resolution”  according  to  where  the  most  consistent  esumate 

(step  1)  occurs  or  according  to  the  angle  aixl  scaling  infonmauon  from  the  logspiral 
plane,  again,  to  narrow  down  the  region  of  operation 

b) .  Calculate  m  and  Ay  using  the  same  procedures  as  in  step  1.  but  this  time  use  the  win¬ 

dows  generated  in  a). 

c) .  Repeat  part  b  through  e  m  step  1 

3.  Repeat  step  2  until  the  algonthm  converges 


The  above  sirategv  vvill  be  used  dunng  a  real  image  simulaticn.  Nonce  that  in  step  1.  the  compu- 
tauon  plane  may  or  may  not  give  any  information  about  rotation  and  scaling.  This  depends  on 
how  inconsistent  the  estimates  in  the  computation  plane  are.  We  shall  pursue  this  matter  in  the 
simulauon  section. 


4.3.1.  Simulations  with  a  Real  Image 

The  technique  proposed  above  was  tested  with  a  real  image  taken  from  a  COD  camera.  The 
object  IS  a  picture  of  the  deck  of  a  submarine  Fig.  4.5.  The  background  is  dark  with  several 
lighter  w  ide  spots,  in  various  degrees  of  intensity.  The  image  mounted  on  a  metal  plate,  is  placed 
on  top  of  a  vertical  steel  rod,  which  can  be  rotated  manually,  in  which  angles  can  be  read  accu¬ 
rately  by  means  of  a  vernier,  to  a  fraction  of  a  degree.  The  camera  is  placed  directly  above  the 
object.  It  is  mounted  on  a  steel  camera  holder  which  can  be  moved  vertically.  The  object  is 

placed  at  an  arbitrary  position  initially.  The  image  is  then  taken  with  a  resolution  of  512x512. 
This  represents  our  first  frame.  The  second  frame  is  obtained  by  first  manually  moving  the  object 
in  the  I  direction  by  0.5  cm  and  in  the  >  direction  by  0.5  cm.  The  vertical  steel  rod  is  then  rotated 
by  5"^.  Finally,  the  camera  is  moved  vertically  up  5  cm.  The  image  is  then  taken.  Thus,  we  have 
the  second  frame  which  represents  the  target,  whereas  the  first  one  is  our  reference,  of  course. 
Notice  that  the  background  is  rotated  and  scaled  along  with  the  object  for  this  particular  experi¬ 
ment  set  up. 

In  order  to  be  able  to  check  whether  the  algorithm  performs  satisfactorily,  we  need  to  know 
to  what  number  of  pixels  in  the  x  and  >  directions  and  to  what  and  k,  the  physical  motion 
experienced  by  the  object  corresponds.  This  is,  in  a  sense,  a  “calibration”  procedure.  In  this  way, 
we  can  compare  the  known  ax,  4>,  and  d8  and  k  to  the  results  obtained  from  the  algorithm.  This 
was  achieved  using  standard  image  processing  techniques.  The  following  are  all  parameters 
resulting  from  those  procedures. 

Background’s  location  (relative  to  an  image  coordinate) 

Top  left  comer  (96,153)  Top  right  comer  (96,342) 

Bottom  left  comer  (327,153)  Bottom  right  comer  (327,342) 

Object’s  location  (relative  to  an  image  coordinate) 

Top  left  comer  (142,179)  Top  right  comer  (142,305) 

Bottom  left  comer  (298,179)  Bottom  right  comer  (298.305) 

Background  size  9x9  cm  =  189x232  pixels 

Object  size  6x6  cm  =  i26xi56  pixels 

The  four  actual  motion  parameters  are 

Ax  =  0.5  cm  =  -1-5  pixels 
A>  =  0.5  cm  =  -4  pixels 
A0  =  5.O°  =  -(-ll° 

K  =  5.0  cm  =  0  84  (dimension  reduced) 

Before  applying  the  algorithm,  we  will  set  the  tracking  window  to  the  size  of  the  object  plus  10% 
on  each  side.  The  tracking  window  will  be  used  exclusively  in  the  rectangular  plane.  The 
logspiral’s  computation  plane  used  consists  of  arcs  of  rings  configuration  with  60  rings,  64 


elements  per  ring,  and  an  exponential  spacing  constant  of  1.04.  We  are  now  ready  to  apply  the 
algorithm  for  estimating  the  perturbations. 

First,  we  apply  the  algorithm  in  the  computation  plane.  Tlie  results  show  in  the  first  column 
of  row  and  column  estimates  Table  4.IV.  As  can  be  seen,  the  estimates  in  both  columns  are  not 
consistent  to  within  some  preset  threshold.  However,  they  are  telling  us  that  the  object  has 

360 

become  smaller.  Furthermore,  it  rotates  by  about  12°  (a.ux — ).  The  rectangular  plane  uses  this 

64 

information  by  setting  its  windows  conservatively  at  3°,  6°,  9°,  12°,  and  15°  for  rotation  and  .75, 
.80,  .85,  .90,  .95  for  the  scaling  factor.  One  could  have  narrowed  the  operating  area  a  little  more 
if  desired.  The  results  for  the  first  iteration  are  shown  in  Table  V(a,b),  whereas,  their  correspond¬ 
ing  histograms  are  illustrated  in  Fig.  4.6(a,b).  From  the  table,  the  horizontal  and  vertical  perturba¬ 
tions  are  2.67  for  Af  and  -0.906  for  ay.  The  second  iteration  then  proceeds  from  this  point  using 
the  exact  same  method  as  before.  During  this  iteration,  again,  the  estimates  in  the  computation 
plane  can  be  used.  Notice  that  they  are  getting  more  consistent  and  that  the  angle  information  is 
around  10°  now.  The  rectangular  plane  uses  finer  resolution  both  in  orientation  and  in  scaling 
(see  Table  4.VI(a,b)  and  Fig.  4.7(a,b)).  From  the  tables,  at  arwl  a/  equal  to  1.83  and  -1.94,  respec¬ 
tively.  During  the  third  and  fourth  iterations,  the  estimates  are  more  or  less  the  same  (see  column 
3  and  4  for  both  row  and  column  estimates,  in  Table  4.rV).  The  rectangular  plane  also  processes 
for  two  more  iterations  before  it  converges.  Table  4. VII  summarizes  the  estimates  in  the  rec¬ 
tangular  plane.  The  overall  estimates  for  horizontal  and  vertical  perturbations  are  4.565  for  a*  and 
-3.834  for  Ay.  The  rotation  and  scaling  parameters  from  both  planes  (they  agree  to  within  a  small 
percent  error)  are  approximately  10°  and  0.83,  respectively. 

The  results  arc  very  satisfactory  although  they  arc  not  exact  as  predicted.  Actually,  we  do 
not  know  exactly  what  the  true  perturbations  are  due  to  inevitable  errors  in  the  translation 
and  scaling  displacements,  specially.  Furtheratore,  in  most  tracking  applications,  the  estimates 
do  not  need  to  be  exact  but  they  must  be  accurate  to  within  some  percents.  These  arguments  also 
apply  to  the  iteration  processes.  That  is,  the  rectangular  plane  does  not  have  to  use  its  windows 
from  3°  to  15°  for  rotation  and  .75  to  .95  for  the  scaling  factor.  Instead,  it  could  have  used  finer 
resolution  in  the  first  place.  This,  of  course,  depends  on  how  useful  the  information  from  the 
computation  is.  If  the  estimates  are  too  inconsistent,  the  computation  plane  is  better  not  to  tell 
anything  and  vice  versa.  The  degree  of  inconsistency  of  the  estimates  in  turn  depends  on  how 
large  the  motion  is.  If  it  is  not  too  large  as  in  this  case,  then  estimates  are  not  quite  inconsistent 
as  they  ought  to  be.  For  a  fast  sampler,  this  could  well  be  the  case.  Notice  that  the  degree  of 
inconsistency  of  the  estimates  can  be  observed  from  Table  4.IV.  As  can  be  seen,  as  the  2D  trans¬ 
lation  is  getting  smaller  (by  updating  the  reference),  the  more  consistent  the  estimates  will  be. 


4.4.  Speed  Comparison 

In  this  section,  we  compare  the  speed  of  one-dimensional  correlation  and  conventional 
correlation.  Although  speed  comparison  with  other  algorithms  will  not  be  made,  the  analysis  may 
be  conducted  in  a  way  similar  to  the  one  used  here.  Parallel  processing  for  both  algoritluns  will 
not  be  considered.  We  will  use  a  serial  processor  for  both  of  them. 

4.4.1.  Conventional  Correlation 

As  is  well  known,  direct  correlation  can  be  used  to  find  target  perturbations.  This  technique 
requires  a  tremendous  amount  of  processing  time.  An  advantage  of  the  technique  is  that  a  solu¬ 
tion  is  guarantee  as  long  as  we  are  willing  to  wait.  For  a  large  image  size  («>i2s),  the  correlation  is 
carried  out  in  the  frequency  domain  rather  than  in  the  spatial  domain.  The  images  that  need  to  be 
correlated  arc  transfomed  to  the  frequency  domain  using  the  Fast  Fourier  Transform  (FFT).  The 
matrix  (complex  number)  multiplication  is  then  performed  on  the  resultant  images.  The 
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Table  4.V(b)  First  Iteration  Ax=5.0,Av=^.0.A&=11.0’.)c=.84 
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Table  4.VI(a)  Second  Iteration  ^c=5.0,Av=— 4.O,A0=1  1.0^,k'=.84 
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Table  4.VI(b)  Second  Iteration  At=5.O,A>’=^.O,A0=ll.O°.)c=.84 
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maximum  frequency  spectrum  must  be  then  located  in  order  to  compute  the  perturbations.  The 
following  is  a  number  of  operations  required  to  compute  the  correlation. 

1 .  Number  of  operations  for  FFT  =  2n\oi^n 

2.  Number  of  operations  for  complex  matrix  multiplication: 

Number  of  additions  = 

Number  of  multiplications  = 

4.4.2  One-Dimensional  Correlation 

The  basic  algorithm  for  the  one-dimensional  algorithm  is  indicated  in  (i2).  If  there  is  no  row 
search  performed,  then  for  one  line  we  have 

.Number  of  multiplications  =  3(i 
Number  of  additions  =  4m-i)+  i 
Number  of  division  =  i 

Thus,  for  n  lines,  we  have 

Number  of  multiplications  =  dO/i) 

Number  of  additions  =  4(i(ii-i)  +  n 
Number  of  divisions  =  n 

For  2D  motion,  the  total  number  of  operations  are 

Number  of  multiplications  =  2x«(3«)  = 

Number  of  additions  =  2x(4»i(<i-i)  +  »i)  =  8«(/,-i)  +  2ii 
Number  of  divisions  =  <i +«=  2« 

Example.  Let «  equal  to  256.  For  direct  correlation,  we  have 

FFT  computation  =  2/iiogj«  =  4096 
Matrix  multiplications: 

Number  ofadditions  =  «%-!)=  i6,7ii,680 
Number  of  multiplications  =  «’  =  16,777.216 

For  one -dimensional  correlation,  we  have 

-Number  of  multiplications  =  611^  =  393,216 
-Number  of  additions  =  8»i((i-i)  +  ii  =  522.752 
Number  of  divisions  =  2^1  =  512 

The  percentage  of  number  of  operations  used  is 

393216 

MuitipUcations  =  - xioo  =  2344% 

16777216 

522752 

Additions  «  - xioo  «  3.128% 

16711680 

As  can  be  seen,  we  save  over  95%  witfi  the  one -dimensional  correlation.  This  does  not  count  com¬ 
putation  time  for  FFT  since  the  number  of  operations  required  for  the  FFT  computation  can  be 
used  to  offset  the  number  of  divisions.  Furthermore,  complex  matrix  multiplication  usually  takes 
more  computation  time  than  a  normal  one. 
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The  above  analysis  applies  to  binary  images  since  all  rows  have  the  same  intensity  function. 
Furthermore,  it  also  applies  to  the  case  of  one-dimensional  motion  regardless  of  the  image  type. 
Thus,  no  row  search  is  required  for  both  cases.  For  a  grey  level  image,  row  search  is  neces.sary 
since  a  different  row  has  a  different  intensity  function.  Column  search,  however,  is  not  neces¬ 
sary.  The  reason  for  this  is  that  once  the  position  where  most  consistent  estimates  in  row  direction 
occur,  have  been  located,  a  column  displacement  can  be  obtained.  This  is  true  because  a  row 
mismatch  causes  by  a  vertical  shift  or  column  displacement.  One  may  apply  the  one-dimensional 
correlation  in  the  column  direction  in  order  to  confirm  the  vertical  displacement  as  we  did  to  all 
of  the  simulations. 

Let’s  include  a  row  search  in  the  analysis  above.  Let's  say  that  we  will  search  ±10%  of  an 
object  dimension.  Notice  that  the  second  term  in  (12)  only  needs  to  be  computed  one  time.  There¬ 
fore,  for  n  lines  we  have 

Number  of  multiplications  =  <i(2ji(.2ii)  +  3<i) 

Number  of  additions  =  »i(3(<t-ix.2«)  +  4((i-i)  +  (.2)t+i)l 
Number  of  divisions  =  « (.2*  + 1) 

For  n  equals  to  256,  we  have 

Number  of  multiplications  =  6,907.494.40 
Number  of  additions  =  10301,491.2 
Number  of  divisions  =  13363.2 

The  percentage  of  number  of  operations  used  is 

6907494.40 

Multiplications  - xioo  =  4i.i7% 

16777216 

10301491.2 

Additions  - xioo  =  61.64% 

16711680 

Again,  the  number  of  divisions  are  used  to  offset  with  the  number  of  operations  for  the  FFT  and 
the  additional  time  required  for  the  complex  number  multiplication  and  addition.  As  can  be  seen, 
we  do  not  save  as  much  as  when  we  compute  the  perturbations  without  a  row  search.  However, 
we  still  save  40%  or  more  of  the  computation  time.  The  point  that  we  would  like  to  make  here  is 
that  as  the  motion  gets  more  complicated,  the  processing  time  increases  tremendously.  Thus,  the 
parallel  processors  are  necessary  in  order  to  achieve  a  real-time  processing  for  3D  motion.  A  sin¬ 
gle  processor  will  not  be  able  to  accomplish  the  task.  Although  it  is  true  that  the  conventional 
correlation  algorithm  as  well  as  other  algorithms  can  be  implemented  in  highly  parallel  fashion, 
most  of  them  are  too  complex.  Thus,  it  is  very  difficult  to  design  in  such  manner.  However,  the 
one -dimensional  correlation  has  the  property  of  simplicity  and  separability.  That  is.  its 
mathematic  formula  is  simple.  Each  row  of  a  moving  area  can  be  computed  independently.  Thus, 
it  is  easy  to  design  and  implement  the  one-dimensional  correlation  algorithm  in  highly  parallel 
fashion  (see  [17]  for  harware  implementation  of  the  algorithm). 

4.5.  Conclusion 

A  row  (column)-corrclation-algorithm  for  motion  prediction, easily  implementable  by  paral¬ 
lel  architecture,  has  been  developed.  If  implemented  by  means  of  a  general  purpose  digital  com¬ 
puter,  time  consuming  process  is  expected.  However,  it  is  still  faster  than  conventional  correla¬ 
tion  technique.  The  algorithm  also  combines  many  good  feamres  of  various  algorithms  presented 
in  the  literature.  For  example,  it  has  correlation  feature  of  [19]  and  spatio-temporal  technique  of 
[15,16].  In  addition,  the  algorithm  also  possesses  a  good  noise  immunity  property.  The  compuu- 
tion  of  gradients  around  edges  or  comers  does  not  affect  the  accuracy  of  the  estimates.  The 


convergence  rate  appears  to  be  fairly  quick  for  a  large  mouon,  TTie  combinauon  of  the  aJgonihm 
and  the  BVS  sensor  with  two  planes  configurations  has  solved  the  'D  mouon  problem  The 
results  from  the  real  image  simulation  are  encouraging. 
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5.  PATTERN  RECOGNTTION  USING  THE  BVS,  THE  C-TRANSFORM  AND 
A  NEUTRAL  NETWORK  CLASSIFIER 

5.1.  Introduction 

A  major  problem  in  pattern  recognition  is  object  recognition  when  position  and 
orientation  vary  in  the  image  field.  Typically,  the  problem  is  solved  by  template  match¬ 
ing.  The  template  is  compared  to  the  image  (usually  by  taking  some  sort  of  correlation), 
and  it  is  then  shifted  to  another  location  and  again  compared.  This  process  is  repeated 
until  all  possible  locations  are  covered.  If  more  than  one  object  is  sought,  then  more 
templates  must  be  used.  The  case  with  the  highest  correlation  is  then  selected  as  the 
recognized  object.  This  process,  while  fairly  accurate,  is  very  time  consuming  even 
when  small  image  arrays  are  used. 

In  the  biological  visual  system  (BVS)  a  complex  logarithmic  conformal  mapping 
takes  place  between  the  retina  and  pans  of  the  conex  (specifically  area  17).  The  out¬ 
standing  feature  of  the  BVS  sensor  is  form  invariance  under  magnification  and  rotation  in 
computation  plane.  The  use  of  a  translation  invariant  transform  in  computation  plane 
would  shift  the  object  to  a  standard  location  regardless  of  the  rotational  orientation  or 
size  of  the  object  in  image  plane.  This  would  allow  faster  pattern  recognition  of  objects 
which  is  not  dependent  on  rotations  and  magnifications. 

A  class  of  translation  invariant  transforms,  the  C  -  transforms,  possesses  the 
desired  characteristics  plus  other  features  which  make  this  class  useful.  The  functions 
involved  are  very  simple,  and  the  transforms  themselves  exhibit  a  high  degree  of  paral¬ 
lelism.  This  transform  can  be  performed  using  a  highly  parallel  artificial  neural  system 
(ANS)  for  greater  speed  and  reliability. 

This  class  of  transforms  uses  identical  functional  layers  of  simple  operations  that 
can  be  done  very  quickly  without  computationally  intensive  multiplications  and  divi¬ 
sions.  Using  this  parallel  structure,  the  transforms  provide  a  method  of  transforming  any 
shifted  image  to  a  standard  form,  that  can  be  used  with  a  template  matching  scheme  for 
pattern  recognition.  The  major  advantages  of  such  a  system  are  its  simplicity,  highly 
parallel  nature,  fast  operation,  and  the  computational  gain  derived  from  the  reduced 
complexity  of  the  template  matching. 

5.2  Background 

The  C-transform  class  is  a  class  of  nonlinear  translation  invariant  transforms  that 
can  be  implemented  with  simple  parallel  networks  of  computational  elements.  Reitboeck 
and  Brody  (1969)[1]  developed  the  R-transform,  the  first  known  member  of  this  class. 
The  R-transform  could  be  implemented  using  simple  functions  (sum  and  absolute  differ¬ 
ence)  in  a  parallel  arrangement  similar  to  that  of  the  Fast  Fourier  Transform  (Fig.  5.1). 

The  transform  can  be  used  on  vectors  of  length  2*  where  n  in  an  integer.  The  actual 
calculation  of  the  transform  can  be  expressed  as: 

Symmetric  Functions /,  and  over  sequence: 

a(/)  ,  /=0.1.....2''-1  (1) 

The  K*  component  of  the  transform: 


Figure  5.1.  C-Transform  Flow  Graph  (21 


A(K)  . 

let  the  n  -bit  binary  expansion  of  A"  be  k^jc  .  jc, 
find  sequences  )'o(0.>, (/)..  .>',(/)  by: 

y^il)  =  a(l) 

and 

where 
/.=/,  'f 

or 

f.=f2  'f  *.  =  > 

The  A**  component  of  the  transform  is  then. 
A(K>=y^(K) 
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This  is  the  formulation  of  the  transform  for  the  one  dimensional  case.  The  exten¬ 
sion  into  two  dimensions  provides  a  similarly  simple  representation  of  the  transform; 

Let  A  (fjy,  . 2"-l  be  an  array. 

Construct  matrices  y  V,...,y*  such  that: 


y\ij)  =  Aujy, 

(8) 

Let 

y''\ij)  =  p 

(9) 

y'~\i-^2''''j)  =  Q 

(10) 

Y'~\iJ+2'"^)  =  R 

(11) 

y"’(/^2*-V^2-‘)  =  s 

(12) 

Then  form  matrices  by: 

(13) 

k'(2/ ,2/^1)  =/,(/,(/»  ,2)/, (/?^)) 

(14) 

Y'm*\2J)--^f,{J^{P,Q)J2{R^)) 

(15) 

y  ’  ( 2/ ^  1 2/ + 1 ) = / i(/ 1(/>  ,(2 )  / 1(/?  J ) ) 

(16) 

for/j=0,l . 

The  Two-D  transform  is  then: 


XdJ)  =  y'i/y):  ij=0,\....f (17) 

Wagh  and  Kanetkar  ( 1975)[31  examined  the  transform  in  more  detail  and  developed 
a  form  called  the  Generalized  R-transfona  They  also  made  an  extension  of  the  two 
dimensional  R-transform  into  arrays  that  were  rectangular,  not  only  the  square  arrays  of 
Reitboeck  and  Brody. 

Wagh  and  Kanetkar  (1977)[21  generalized  this  translation  invariant  transform  from 
the  R- transform’s  sum  and  absolute  difference  functions  to  any  set  of  two  functions  that 
are  argument  symmemc.  TTicy  make  a  convincing  case  for  the  use  of  the  M-transform  on 
the  grounds  that  at  higher  order  it  has  greater  accuracy  (more  possible  classes)  than  the 
R  transform.  The  M-transform  also  provides  binary  outputs  for  binary  inputs,  which 
leads  to  a  smaller  transform  volume  (and  therefore  less  memory  used)  as  compared  to  the 
R  transform  which  does  not  provide  binary  outputs.  They  also  claim  that  the  functions 
used  in  the  M  transform  (logical  OR  and  ANT))  arc  faster  to  implement  than  the 
Lorresponding  R-transform  functions.  The  binary  nature  of  the  .M-transform  makes  it 
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more  attractive  for  ANS  implementation. 

Burkhardt  t  ad  Muller  (1980)[4]  did  a  much  more  in  depth  mathematical  examina¬ 
tion  of  the  properties  of  this  class  of  transforms.  They  included  the  R  and  M  transforms 
as  well  as  two  others  in  their  analysis  of  the  translation  invariant  mapping  propenies  of 
the  transform  class. 

Reitboeck  and  Altmann  (1984)[5]  discuss  the  use  of  the  R-transform  in  a  detailed 
model  of  magnification  and  rotation  invariance  in  the  BVS.  They  compare  the  R- 
transform  to  other  methods  (Mellin  and  Fourier-Mellin  transforms)  as  they  could  possi¬ 
bly  exist  in  the  BVS.  Their  model  covers  the  log-polar  mapping  of  information,  the 
DOG  smoothing  function  that  is  similar  to  the  receptive  field  weighting  in  the  retina,  and 
various  processing  (thresholding  and  edge  detection)  that  is  useful  in  obtaining  reason¬ 
able  results.  They  also  suggest  that  these  transforms  can  be  implemented  using  an  ANS. 
They  go  to  a  great  length  to  justify  the  use  of  these  transformations  in  pattern  recognition 
in  their  detailed  BVS  model. 

5.3.  R  and  M  Transform  Advantages  and  Disadvantages 

The  R  and  M  transforms  exhibit  a  number  of  advantages  and  disadvantages  that  are 
relevant  to  pattern  recognition  problems.  These  will  also  influence  choices  that  are  made 
concerning  implementation. 

An  advantage  demonstrated  by  the  R-transform,  as  mentioned  earlier,  is  that  it  can 
work  on  continuous  as  well  as  binary  information.  This  would  allow  grey  level  process¬ 
ing  of  images  under  certain  circumstances.  Another  advantage  is  that  the  functions 
involved,  sum  and  absolute  difference  are  accomplished  rather  quickly  in  computer 
implementations. 

The  R-transform  also  has  a  number  of  disadvantages,  one  of  the  most  notable  being 
that  the  transform  volume  grows  very  quickly.  This  means  that  the  numbers  in  the 
transformed  vector  or  matrix  are  typically  much  larger  than  the  values  in  the  input  vector 
or  array.  This  makes  it  necessary  to  use  more  computer  storage  space  to  perform  this 
transform.  Another  aspect  of  this  problem  is  that  this  characteristic  leads  to  non-binary 
outputs  for  binary  inputs.  This  could  lead  to  difficulties  in  pattern  recognition  schemes. 
In  higher  order  problems,  the  R-transform  has  fewer  distinct  classes  of  transforms  than 
the  M-transform.  This  means  that  the  accuracy  of  transformation  is  lower  for  the  R- 
transform  when  the  problem’s  order  is  high. 

The  M-transform  has  several  advantages  when  compared  to  the  R-transform.  The 
transform  volume  of  the  M-transform  is  much  more  limited  than  the  R-transform  and  is 
on  the  same  order  as  the  input  volume.  The  functions  involved,  logical  AND  and  OR, 
are  usually  among  the  fastest  implemented  on  the  computer,  and  are  probably  faster  than 
the  corresponding  R-transform  functions.  The  M-transforra  provides  binary  outputs  for 
binary  inputs,  so  there  is  no  compatibility  problems  with  other  parts  of  a  pattern  recogni¬ 
tion  system.  In  low  order  problems,  the  M-transform  has  fewer  distinct  output  classes 
than  the  R-transform,  but  as  the  input  order  increases  this  changes  so  that  the  M- 
transform  is  more  accurate. 

The  major  disadvantages  of  the  M-transform  include  the  necessity  for  binary  input 
information.  The  M-transform  cannot  use  continuous  data.  Another  disadvantage  is  due 
to  the  functions,  OR  and  AND,  which  are  sometimes  difficult  to  implement  using  some 
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higher  level  programming  languages. 

5.4.  Computer  Simulation  Results 

Initial  testing  was  done  on  the  R  and  M-transforms  by  generating  a  large  number  of 
8  dimensional  binary  vectors.  This  vector  set  was  used  to  test  the  propenies  of  the 
transforms.  This  was  done  primarily  to  determine  if  the  theory  was  correct  in  all  of  its 
claims. 

The  R  and  M-transforms  were  both  tested  with  these  vectors  and  shifted  vectors, 
including  cyclic  shifts,  producing  the  same  transformed  pattern,  as  expected.  Advan¬ 
tages  and  disadvantages  of  both  transforms,  mentioned  earlier,  were  also  confirmed. 
Further  testing  was  performed  on  8  dimensional  binary  vectors,  this  time  on  a  set  of  all 
256  possible  8  dimensional  vectors.  The  major  reasons  for  this  experimentation  was  to 
determine  transform  volume  and  the  number  and  types  of  classes  present  in  the  output  of 
the  R  and  M-transforms. 

The  R-transform  was  performed  on  this  set  of  256  vectors  and  the  results  in  terms  of 
translational  invariance  were  unchanged.  The  maximum  input  vector  volume  was  8  (for 
the  all  Is  case)  and  the  maximum  output  transform  volume  was  20.  The  output  was  not 
binary  just  as  in  the  preliminary  testing.  The  number  of  distinct  classes  found  was  21.  so 
the  R-transform  divides  the  256  8  dimensional  binary  vectors  into  21  classes. 

The  M-transforms  of  this  vector  set  again  demonstrated  the  invariance  to  transla¬ 
tions  as  shown  by  the  previous  inquiry.  The  maximum  input  vector  volume  was  8  and 
the  maximum  output  vector  volume  was  also  8.  The  output  was  binary,  and  the 
transformed  vector  is  identical  to  the  input  vector  (shifted  to  a  standard  location).  The 
transfoim  yields  20  distinct  classes  of  outputs  for  the  256  binary  vector  inputs. 

In  comparison,  the  time  performance  is  very  similar  for  the  transforms,  both  of 
which  are  very  fast.  The  transform  volume  is  much  greater  for  the  R-transform,  and  this 
problem  will  continue  to  worsen  as  the  order  of  the  problem  increases.  The  R-transform 
provides  an  output  that  is  continuous  while  the  M- transform’s  output  is  binary.  The  R- 
transform  does  not  resemble,  even  superficially,  the  input,  while  the  M-transform  is 
identical  (but  often  shifted)  to  the  input.  The  number  of  distinct  classes  at  this  low  order 
is  slightly  greater  for  the  R-transform,  but  as  the  order  increases  this  will  reverse  and  the 
M-transform  will  become  more  accurate. 

5.5.  Simulation  of  2  Dimensional  Case" 

The  next  stage  of  simulation  was  done  to  test  the  2  dimensional  R  and  M- 
transforms.  8x8  binary  arrays  were  used  as  inputs  to  the  2  dimensional  transforms.  The 
images  tested  were  binary  4x4  squares  in  the  8x8  array.  The  square  was  shifted  to  25  dif¬ 
ferent  locations  in  the  8x8  array  to  determine  the  translational  invariance  of  the 
transforms. 

The  R  and  M-transforms  gave  the  same  output  for  all  25  4x4  squares,  which 
verifies  the  invariance  to  translations  of  these  transforms.  The  volume  of  the  R-transform 
was  quite  large  compared  to  the  input  volume.  The  M-transform  volume  was  the  same  as 
the  input  volume,  and  much  smaller  than  the  R-transform  volume. 

The  R  and  M-transform  performance  was  examined  for  higher  order  2  dimensional 
arrays.  The  purpose  was  to  determine  the  characteristics  of  the  transforms,  specifically  to 


verify  properties  and  to  test  the  sensitivity  of  the  process.  Three  classes  of  objects  were 
considered,  lines,  squares,  and  circles.  The  images  contained  one  of  the  three  at  some 
location  in  the  image  field.  The  sensitivity  of  the  transforms  was  tested  by  varying  the 
width  of  the  lines  and  object  sizes  slightly  and  examining  the  differences  in  the  resulting 
transforms.  As  before,  the  images  were  binary,  and  the  array  size  was  64x64.  This  size 
was  chosen  because  it  meets  the  2*  requirement  and  is  easily  done  using  the  BVS  sensor. 

The  R-transform  of  the  higher  order  arrays  again  demonstrated  the  shift  invariant 
nature  of  the  transform.  The  continuous  output  had  a  volume  much  greater  than  either 
the  input  volume  or  the  lower  order  simulations.  The  output  arrays  had  some  sensitivity 
to  line  width  and  object  size,  i.  e.  slight  variations  gave  slightly  different  transforms. 
This  variation  in  the  transform  increases  in  proportion  with  the  variation  in  the  image. 
This  sensitivity,  while  bothersome,  is  certainly  within  limits  of  adjustment  for  pattern 
recognition.  The  actual  recognition  algorithms  must  be  adjusted  to  tolerate  some  vana- 
tions. 

Tests  using  the  M-transform  provided  similar  verification  of  the  translational  invari¬ 
ance  properties.  The  output  in  this  case  was  binary,  and  the  volume  was  of  the  same 
order  as  the  input  volume.  The  M-transform,  like  the  R-transform,  showed  some  sensi¬ 
tivity  to  line  thickness  and  slight  variations  of  size,  but  as  before,  these  could  be  com¬ 
pensated  for  in  the  recognition  algorithms.  The  output,  unlike  the  R-transform  output, 
looks  like  the  object  in  the  input  array. 

There  are  several  problems  with  using  arrays  of  this  size,  primarily  due  to  memory 
space  usage.  While  binary  images  take  up  much  less  space  than  grey  level  images,  the 
size  of  these  (64x64  or  larger)  prevents  comparison  of  more  than  a  small  number  at  once. 
This  immediately  eliminates  certain  types  of  pattern  recognition  schemes.  Another  prob¬ 
lem  is  that  even  after  the  scene  has  been  "standardized"  by  the  transform,  it  still  poses  a 
pattern  recognition  problem  of  fairly  high  order,  so  even  though  the  processing  is  drasti¬ 
cally  simplified  it  is  still  fairly  extensive. 

5.6,  Use  of  Transforms  With  BVS  Sensor 

The  R  and  M  transforms  must  be  used  on  square  or  rectangular  arrays  so  the  only 
compatible  computational  plane  is  the  arc  of  ring  arrangement.  This  gives  a  rectilinear 
computational  plane  that  can  be  adjusted  to  the  proper  square  2'  array  size. 

An  advantage  of  the  transforms  with  the  BVS  sensor,  as  mentioned  earlier,  is  that 
under  the  LSM  scalings  and  rotations  become  translations,  so  these  transforms  can  be 
used  to  standardize  all  magnifications  and  rotations  to  a  single  image  for  template  match¬ 
ing. 

The  choice  of  the  C-transforms,  specifically  the  M-transform,  for  ANS  implemen¬ 
tation  is  motivated  by  a  number  of  reasons.  The  primary  reasons,  as  mentioned  above, 
are  the  simplicity  of  function  and  parallelism.  The  transforms  can  be  performed  by 
identical  layers  of  simple  elements.  Another  aspect  which  reinforces  the  suggestion  of 
ANS  implementation  is  that  the  transforms  have  a  known  input/output  relationship,  so 
performance  can  be  evaluated  easily.  The  M-transform  uses  only  binary  information  for 
the  vector  or  image  representation,  so  the  high  gain  sigmoid  transfer  function  used  in 
many  ANS  will  provide  a  binary  output.  And  finally,  the  simulations  can  be  developed 
in  fairly  low  order  and  can  then  be  easily  generalized  to  two  dimensions  or  to  higher 


order  problems.  These  traits  of  the  M-transfonm  make  it  a  reasonable  candidate  for  ANS 
implementation. 


5.7.  Network  Structure  and  Weight  Matrix 

An  ANS  implementation  of  the  M-transform  utilizes  the  processing  elements  (PEs) 
as  threshold  logic  gates  to  perform  the  functions  of  AND  and  OR.  This  means  that  a 
high  gam  is  necessary  to  make  them  behave  properly.  The  connection  weights  can  be 
determined  quite  easily  for  the  vector  case,  since  each  PE  has  two  inputs. 

In  the  AND  case,  the  PE  should  go  to  one  if  and  only  if  both  inputs  are  one,  so  the 
inputs  should  be  weighted  such  that  their  weighted  sum  is  greater  than  0.5,  but  their  indi¬ 
vidual  weights  are  less  than  0.5.  A  selection  of  0.4  for  the  AND  PE  case  should  provide 
this  function.  In  the  OR  PE  case,  the  output  should  be  one  if  any  input  is  one,  so  the 
weights  should  be  chosen  such  that  any  input  of  one  will  give  an  element  input  of  more 
than  0.5.  The  choice  was  0.6  for  the  OR  PE  weightings,  which  made  it  possible  for  the 
weighted  sum  of  the  inputs  to  be  greater  than  one.  This  made  choice  of  the  transfer  func¬ 
tion  harder. 

The  form  of  the  network  can  be  seen  as  Fig.  5.2. 

As  shown,  the  network  is  not  symmetric,  as  is  the  case  in  a  Hopfield  network.  But 
in  this  case  there  is  no  feedback,  so  stability  is  assured.  Finite  inputs  yield  finite  outputs. 
The  binary  nature  of  the  solution  can  be  demonstrated  as: 


Inputs  : 

i^  <  \  so 

(18) 

Element  inputs  : 

X  =  Yt  w<  ]. 2 

J 

(19) 

Element  outputs 

:  =  If  ij:,  )  <  1 

(20) 

Element  inputs 

=  TV>  ,  <  1.2 

(21) 

Final  outputs 

=  If  1  ■'i ,  •  ^  1 

(22) 

In  the  demonstration  above,  s  1  means  that  the  element  outputs  that  should  be  one 
are  very  close  to  one  due  to  their  input  values  and  the  sigmoid  transfer  function.  The 
transfer  function  must  have  the  property  of  mapping  values  of  input  that  are  greater  than 
1  to  an  output  of  1  (or  very  nearly  1),  as  well  as  the  normal  sigmoid  characteristic  about 
0.5.  This  is  accomplished  by  limiting  the  input  side  of  the  PEs  to  only  small  values 
greater  than  1  (in  this  case  1.2)  and  to  use  only  even  powers  of  gain  The  transfer  func¬ 
tion  actually  used  is: 
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Where  V'(l  j)  is  the  output  of  element  j  on  slab  1.  x{\j)  is  the  input  to  element  y  on 
slab  1  and  p  is  the  gain  of  the  function.  The  evenness  requirement  is  to  prevent  the  out¬ 
put  from  becoming  greater  than  1.  In  order  to  make  the  network  behave  properly  a 
number  of  other  functions  must  also  be  specified.  The  I-functions  relate  the  scaling  of 
the  incoming  signals  by  the  proper  weights,  and  the  F-function  determines  how  the  I- 
functions  arc  combined.  There  are  two  I-functions,  one  corresponding  to  the  external 
inputs  to  the  elements  (slab  0  to  slab  1  connections),  and  the  other  corresponding  to 
feedback  to  the  elements  (slab  1  to  slab  1  connections).  The  functions  used  are  shown 
below; 


/(l.O)  =  V(O.OH'(l.y.O,i) 


(24) 


/(l.l)  =  V(l./)w(l,y.l.i) 


(25) 


and 


F(\)  =  /(l.O)  +  /(l.l) 


(26) 
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With  the  network  structure  and  weight  matrix  described  above,  the  M-transform 
was  performed  on  4  dimensional  binary  vectors.  We  would  expect  to  see  the  transla¬ 
tional  invariance  of  the  transform,  as  well  as  very  fast  operation  of  the  parallel  arrange¬ 
ment.  The  epsilon  step  size  can  be  1  since  the  output  is  essentially  a  function  of  the  input 
directly  and  not  of  the  past  as  in  the  TSP.  For  all  4  dimensional  inputs  the  results  are 
summarized  in  the  following  table  (Fig  5.3.): 

This  table  clearly  shows  the  translational  invariance  of  the  M-transform.  In  the  case 
of  a  translated  vector  (for  example  0(301  and  0100)  the  output  is  the  same  as  the  non- 
translated  case  (both  are  0001).  This  just  as  we  expected  from  the  previous  discussion. 
The  transform  is  also  very  fast,  depending  upon  the  choice  of  epsilon.  The  limit  (epsilon 
of  1 )  is  convergence  in  two  steps  due  to  the  two  layer  parallelism  of  the  transform. 

The  transform  can  also  be  expressed  in  a  compressed  form  and  extended  into  two 
dimensions. 

5.8.  Reduced  Transform  Representation 

It  is  desired  to  develop  a  reduced  representation  for  this  transform  so  that  it  may  be 
determined  whether  any  computational  savings  can  be  obtained  through  another  expres¬ 
sion  of  the  transform.  This  would  involve  using  the  PEs  as  more  than  just  two  input 


M-transform  Results  :  4  Inputs 
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0001 
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0111 

1100 

0011 
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1110 
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Figure  5.3.  4  Dimensional  Vector  M-transform  Results. 


threshold  logic  gates.  It  seems  possible  that  a  compressed  M-transform  representation 
may  use  fewer  elements  or  fewer  layers  than  the  normal  arrangement. 

The  nature  of  the  transform  is  examined  and  the  equations  for  the  outputs  are  deter¬ 
mined  in  terms  of  the  inputs.  In  the  4  input  case,  this  procedure  can  be  summarized  as; 


Inputs  :  /,,  /,.  /j,  (27) 

Outputs  :  M My  M^  (28) 

^  +  ^3  +  ^2  +  ^  (29) 

=  (/,  +  /jX/j  +  (30) 

M3  =  1,1,  +  //,  (32) 

^#4  =  (33) 


Where  the  "add"  and  "multiply"  operations  are  logical  OR  and  AND. 

The  above  equations  demonstrate  that  two  of  the  functions,  M ,  and  M^,  can  indeed 
be  compressed  because  they  consist  of  only  one  type  of  function.  The  other  two,  M^  and 
Mj,  cannot  be  done  using  a  single  element.  Each  of  these  is  a  comparison  of  two  binary 
quantities  that  are  in  turn  comparisons  of  two  binary  quantities.  This  means  that  since  an 
element  is  needed  for  each  comparison,  3  elements  are  necessary  for  each  of  Mj  and  M,. 

This  total  of  8  elements  is  the  same  as  in  the  noncompressed  case,  so  no  computa¬ 
tional  advantage  can  be  gained  in  the  number  of  elements  used.  Furthermore,  Mj  and  Mj 
are  two  layers  deep,  so  there  will  be  no  time  improvement  either.  Actually,  the  original 
case  is  superior  because  all  of  the  elements  reach  the  proper  outputs  at  the  same  time, 
since  they  all  have  the  same  depth  (number  of  levels). 

This  means  that  while  expectations  were  for  a  more  efficient  use  of  network  struc¬ 
ture,  the  best  arrangement  proved  to  be  the  original  parallel  layers  of  identical  structure. 

5.9.  Extension  into  Two  Dimensions 

The  transform  extension  into  2  spatial  dimensions  can  be  accomplished  in  the 
manner  described  by  equations  8-17.  The  transform  flow  in  the  2  dimensional  case  con¬ 
sists  of  layers  of  arrays  of  elements,  with  each  layer  having  the  identical  functional 
arrangement. 

Unfortunately,  the  functions  present  in  the  2  dimensional  case  are  actually  combi¬ 
nations  of  the  simple  functions  of  the  1  dimensional  case  (see  equations  13-16).  This 
leads  to  problems  like  those  faced  when  a  reduced  transform  representation  was  sought, 
i.e.  it  takes  3  PEs  to  make  up  one  functional  "element".  This  problem  is  not  actually  as 
bad  as  it  first  seems  because  the  initial  functions  have  quite  a  bit  of  overlap.  This  means 
that  the  output  of  the  first  PEs  in  a  functional  element  is  needed  by  2  functional  elements. 
This  allows  implementation  of  the  2  dimensional  M-transform  using  only  twice  as  many 
PEs  as  functional  elements. 


Given  this  requirement,  the  4x4  2  dimensional  M-transform  can  be  performed 
using  64  PEs  as  2  input  threshold  logic  gates.  The  binary  input  signal  enters  from  slab  0 
and  proceeds  through  two  functional  layers  of  32  PEs  each,  which  is  actually  four  logical 
layers  of  16  PEs  each.  This  doubles  the  time  to  convergence  for  the  .M-transform. 

The  same  functions,  the  g,  I,  and  F-funciions,  are  used  in  the  2  dimensional  case 
as  were  used  in  the  vector  case.  The  weight  matrix  has  been  expanded  and  changed  to 
reflect  the  4  logical  layers  of  the  2  dimensional  transform.  As  in  the  previous  implemen¬ 
tation,  the  matrix  is  not  symmetric,  except  in  the  sense  that  the  two  layers  have  sym¬ 
metric  functionality.  Again  there  is  no  direct  feedback,  so  stability  is  assured. 

We  would  expect  to  see  a  transform  that  takes  twice  as  long  as  the  previous  case, 
and  provides  the  necessary  invariance  to  translations.  The  results  obtained  do  indeed 
demonstrate  these  two  features,  and  the  translational  invariance  is  demonstrated  in  Fig, 
5,4. 

A  number  of  other  input  arrays  were  used,  and  the  results  always  showed  the  trans¬ 
lational  invariance  property. 

5.10.  Implementation  of  Learning  in  an  ANS 

Neural  networks  in  living  creatures  are  often  capable  of  learning  quite  a  variety  of  * 
things,  which  gives  these  networks  a  major  advantage  over  conventional  networks  in  a 
number  of  applications.  Learning  in  a  neural  network  is  accomplished  by  the 
modification  of  the  interconnection  weights  between  the  elements.  An  ANS  that  has  the 
ability  to  learn  must  also  include  some  provision  for  the  modification  of  its  connection 
weights.  In  a  hardware  device  this  may  be  difficult  if  not  impossible  from  a  practical 
standpoint,  but  it  is  easily  accomplished  in  simulation. 

The  general  form  of  the  weight  modification  rule  employed  in  ANS  learning  is: 

w./t+l)  =  +  Aw^(0 

Where  is  the  connection  weight  from  unit  i  to  unit  j.  This  modification  can  be 
accomplished  by  three  mechanisms  used  in  determining  Aw .  They  are  broken  into  three 
groups  according  to  the  amount  of  supervision  used  in  each. 

The  first  type  of  learning  rule  is  unsupervised,  in  which  the  network  learns  correla¬ 
tions,  i.e.  if  element  i  on  is  likely  to  drive  element  j  on  the  connection  gets  stronger. 
This  rule  can  be  summarized  by  the  following  equation: 

Aw,^  =  t,/(X,Wp^(x  ,w,^) 

In  this  rule,  /k,  is  the  learning  parameter,  which  determines  how  much  the  connec¬ 
tion  weight  is  changed.  This  may  be  a  constant  or  may  vary  over  time.  Aw  is  also  a 
function  of  the  current  state  of  the  j  and  the  particular  input  and  weight  of  the  /  ele¬ 
ment.  Unsupervised  learning  is  useful  when  the  exact  desired  output  is  unknown,  or 
there  are  many  elements  that  cannot  be  directly  specified. 

The  second  type  of  learning  is  called  weakly  supervised  and  uses  a  single  error  sig¬ 
nal  to  tell  the  network  to  learn  or  not  to  learn.  This  is  like  telling  the  network  if  it  is 
wrong,  but  not  telling  it  where  it  is  wrong.  This  rule  follows  an  equation  like: 


5H 


Where  k,  is  the  learning  parameter,  as  before.  However,  now  Aw  is  a  function  of  the 
error  signal  I,  and  its  weight  w,,  as  well  as  a  function  of  the  particular  input  and  its 
weight.  This  type  of  learning  is  useful  when  *he  desu-ed  output  is  known,  but  for  some 
reason  the  elements  cannot  all  be  directly  specified  (hidden  elements)  or  a  full  error  vec¬ 
tor  is  unknown  (order  of  the  system  is  too  large,  etc.). 

The  third  type  of  learning  is  highly  supervised,  in  which  the  weight  modifications 
are  adjusted  according  to  the  amount  of  error,  i.e.  the  weights  are  changed  in  proportion 
to  their  contribution  to  the  error  for  that  element.  The  general  form  of  this  rule  is; 


Once  again,  is  the  learning  coefficient,  but  now  Aw  is  a  function  of  a  training 
vector  /j.  and  the  output  vector  v,  so  that  an  error  value  is  generated  for  each  element. 
This  differs  from  the  previous  weakly  supervised  case  in  which  an  overall  error  signal 
was  generated.  As  before.  Aw  is  some  function  of  the  individual  weight  and  input  for 
that  element  This  type  of  learning  is  used  when  the  outputs  of  each  element  are  known 
so  that  the  error  vector  can  be  generated. 

Since  the  M-transform  can  be  calculated  easily  by  a  separate  program,  all  of  the 
outputs  can  be  determined  for  the  generation  of  the  error  vector.  This  allows  the  use  of 
highly  supervised  learning  to  teach  a  simple  network  the  M-transform.  The  Aw  function 
that  was  used  is: 


Where  and  are  the  desired  and  actual  outputs  of  element  j  and  is  the  i‘*  input 
to  element  j.  The  difference  corresponds  to  the  j  component  of  the  error  vector,  and 
the  I,  term  determines  the  amount  of  contribution  to  the  output  for  that  particular  weight. 
Using  this  function  the  network  can  be  taught  the  M-transform. 

The  network  structure  used  is  the  same  as  in  the  four  dimensional  vector  case,  as 
shown  previously,  with  four  inputs  to  eight  elements.  The  I  and  F-functions  are  the 
same  as  before,  and  the  g-function  has  been  changed  to  the  hyperbolic  tangent  function 
used  by  Hopfield.  This  function  has  been  shifted  and  scaled  appropriately  so  that  0  and  1 
are  the  bounds  as  in  our  previous  case.  The  new  g-function  was  chosen  because  the  old 
power  function  does  not  asymptotically  approach  1  and  0.  This  means  that  if  the  input 
value  becomes  greater  than  1,  the  output  will  become  less  than  1,  so  that  the  weight 
update  rule  will  push  the  weight  in  the  wrong  direction.  The  hyperbolic  tangent  function 
doesn’t  have  this  problem. 

The  connection  weights  were  initially  all  zero,  as  were  the  initial  conditions.  The 
network  is  presented  with  an  input  vector  and  allowed  to  reach  a  final  state  (two  cycles). 
This  is  necessary  so  that  the  error  vector  is  generated  at  the  proper  time.  Once  the  final 
state  is  reached,  the  error  vector  is  calculated  from  the  training  input  and  the  element 
outputs.  The  weights  are  then  updated  according  to  the  error  vector,  learning  coefficient, 
and  individual  input.  Then  the  initial  condition  is  reset  and  the  next  input  vector  is 
presented  and  the  process  is  repeated. 


Since  there  are  sixteen  possible  four  dimensional  input  vectors,  the  cycle  must  be 
repeated  a  number  of  times  to  reach  a  working  network.  There  are  also  a  number  of  sta¬ 
bility  issues  associated  with  the  learning  that  influenced  the  tinal  procedure.  The  gain  of 
the  sigmoid  plays  a  significant  part  in  the  learning  process.  If  the  gain  is  too  high,  i.e.  the 
sigmiod  is  too  steep,  all  of  the  error  components  are  either  0  or  1,  and  the  system  just 
goes  from  state  to  state  never  reaching  the  desired  result.  This  forces  a  flatter  sigmoid. 
Another  factor  in  the  stability  is  forcing  the  diagonal  elements  to  remain  zero  throughout 
the  simulation.  This  helps  to  maintain  stability  by  removing  direct  feedback. 

In  addition  to  stability  issues,  there  are  several  things  that  must  be  addressed  to 
make  the  system  behave  properly.  The  first  of  these  is  the  noise  margin  of  the  system, 
i.e.  what  values  constitute  zero  and  one.  In  this  case,  a  noise  margin  of  0.15  was  chosen 
so  that  values  of  .85  and  above  are  1  and  .15  and  below  are  0.  TTiis  seems  to  be  a  reason¬ 
able  choice,  and  if  the  output  was  within  this  margin  no  learning  takes  place.  Another 
important  issue  is  the  value  of  the  learning  parameter.  A  number  of  values,  constant  and 
variable,  can  be  used,  and  several  were  examined  to  determine  their  effects  on  the  learn¬ 
ing  algorithm. 

The  simulation  was  performed  on  the  training  set  which  consisted  of  the  sixteen 
input  vectors,  each  appended  w  ith  an  additional  eight  values  represendng  the  desired  out¬ 
puts  of  the  felements.  This  was  done  for  both  constant  and  variable  learning  parameters. 
The  constants  used  were  1.0,  0.5,  0,25,  and  0.1.  We  would  expect  the  lowering  of  the 
parameter  to  provide  smoother  paths  in  learning  space,  and  for  the  system  to  take  more 
iterations  of  the  test  set.  The  results  obtained  do  indeed  demonstrate  this. 

For  the  case  of  a  unity  learning  parameter,  the  network  learned  the  M-transform  in 
about  12  iterations  through  the  test  set.  The  changes  in  the  weights  are  very  discontinu¬ 
ous  or  jumpy.  WTen  the  parameter  was  lowered  to  0.5,  it  required  30  iterations  of  the 
test  set.  but  the  weight  changes  were  smoother.  W'hen  the  coefficient  was  lowered  to 
0.25.  the  number  of  iterations  to  the  desired  result  jumped  to  nearly  100.  The  paths  of 
the  weight  modifications  were  much  more  continuous.  Finally,  with  a  parameter  of  0.1 
the  system  took  almost  1000  times  through  the  test  set  to  reach  proper  M-transform  form, 
while  the  paths  became  very  continuous  in  appearance. 

The  second  case  tested  was  for  a  varying  learning  coefficient.  In  this  case,  the 
parameter  stans  at  0.5  and  is  reduced  in  0.1  increments  ever>'  10  cycles  through  the  test 
set,  stopping  at  0. 1 .  Results  showed  that  this  variable  parameter  provided  a  working  net¬ 
work  in  around  60  cycles  through  the  test  set. 

The  weight  matrix  obtained  by  the  learning  algorithm  is  drastically  different  from 
that  obtained  from  an  initial  design.  This  seems  reasonable  considering  the  nature  of  the 
learning  process.  An  example  of  a  final  state  of  the  weight  matrix  is  shown  in  Fig.  5.5, 
In  general,  highly  supervised  learning  seems  to  be  an  efficient  and  fairly  quick  method  of 
teaching  a  neural-like  network  simple  tasks  such  as  the  M-transform. 

In  general,  the  ANS  approach  to  the  M-transform  is  a  simple,  fast,  and  efficient 
method  of  performing  the  this  translation  invariant  transform. 

Once  the  images  have  been  "normalized"  to  a  standard  location  by  means  of  a  C- 
transform.  the  recognition  problem  remains  to  be  solved.  The  complete  system  for  pat¬ 
tern  recognition  is  shown,  in  block  diagram  form,  in  Fig.  5.6. 


Input  Weight  Matrix 


1.099  0.799  1.099  0.799  0.000  0.000  0.000  0.000 

3.199  -0.699  2.099  4.56e-05  0.000  0.000  0.000  0.000 

0.599  1.099  0.799  1.099  0.000  0.000  0.000  0.000 

-0.299  1.499  1.81e-05  1.399  0.000  0.000  0.000  0.000 

Network  Weight  Matrix 


0.000  8.16e-05  0.200  8.16e-05  0.200  8.16e-05  8.16e-05  8.16e-05 
-3.199  0.000  -0.600  -0.699  -3.099  1.499  1.799  -1.499 

7.71e-05  7.71e-05  0.000  7.71e-05  7.71e-05  7.71e-05  7.71e-05  7.71e-05 
-0.499  -0.399  -0.999  0.000  -0.999  0.999  -1.099  -0.399 

0.200  0.100  1.08e-04  1.08e-04  0.000  0.200  1.08e-04  1.08e-04 

2.699  9.99e-02  3.099  -0.799  -1.299  0.000  -2.099  -0.899 
-56.18  92.39  -58.08  95.38  35.30  -147.3  0.000  -3.699 

-6.299  8.099  -7.099  8.598  -11.29  -2.999  4.899  0.000 


Figure  5.5.  Learned  Weight  Matrix. 
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Hl'pneitJ  and  Tank -ire  not  adaptive,  in  other  words,  they  lack  a  learning  capability.  The 
ceneral  form  of  ihe  H&T  network  tor  pattern  recognition  is  shown  in  Fig  5  '’[6j, 

The  network  has  N  inputs^.!  .t,.  >.  \1  outputs  <i  \  ,,  c  and  an  offset -V. 

t  i;ven  a  signal  space  t  which  is  spanned  by  a  set  of  basis  functions  B^Jc  -  1,2,  M ,  find 
'.’  e  nest  digital  combination  of  basis  functions  which  descnbe  a  given  signal.  For  the 
'pe-.'.hv.  problem  of  pattern  recognition,  the  signal  space  f  would  consist  of  the  two 
t.menMonal  arras  of  pixels  which  make  up  the  computation  space  image.  For  an  8.x8 
liras.  \=o4  The  basis  functions,  in  this  case,  would  consist  of  64-dimensional  vector 
'cpre^entations  of  the  stored  images.  In  the  case  of  binary  images,  this  would  consist  of 
'lies  in  the  positions  where  a  pixel  is  excited  in  the  stored  image  and  zeros  elsewhere. 
1  .ich  ot  the  pnvessing  elements  represents  a  basis  function  or,  in  our  case,  a  stored 
.mage  It.  atier  reaching  a  steady  state,  a  processing  clement  has  value  1 .  then  this  ele- 
•oer.t  ;s  the  best  match  for  the  input  signal.  If  the  input  image  does  not  match  any  of  the 
't-red  images,  all  the  prtxessing  elements  will  have  value  zero  in  the  steady  state. 

In  designing  the  actual  network,  it  is  necessary  to  determine  the  value  of  the  con- 
-emmvn  ^trenglhs  To  this  end.  an  energy  function  must  be  constructed  which  will  have  a 
saiue  for  the  best  combination  of  basis  functions  descnbing  a  given  signal, 
liop'.cid  -ind  Tank  have  proposed  the  following  general  type  of  energy  function  16]. 


»■ 


> 


M,\  1  '.>r  I'l-'r.'r  Rf,  otrni;iMri 


E  =  i/2(J  -  -  \\) 

k  k 

The  first  term  is  minimum  when  a  combination  of  basis  functions  matches  the  input 
signal;  the  second  term  is  minimum  when  the  processing  elements  have  values  of  zero  or 
one.  In  terms  of  amplifier  outputs,  connection  strengths  and  external  inputs, 

.V  .V  V 

e  =  -  ZVI, 

i=i/=i  <=i 

Comparing  the  two  expressions, 

T  =  -(B  B) 

ij  ^  t 

and 

/,  =  [25,  -  1/2(B,B,)] 

Note  that  this  network  can  be  used  to  recognize  binary  images  which  are  non- 
translated,  non-rotated,  and  non-scaled.  In  addition,  the  network  is  not  self-adaptive.  It 
can  be  used,  in  conjunction  with  the  R  or  M-transform  to  implement  a  pattern  recogni¬ 
tion  subsystem  to  recognize  binary  images  in  the  computation  plane  of  a  HVS  sensor 
which  are  invariant  to  scaling  and  rotation  with  respect  to  the  optical  axis. 
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6.  DUAL  SENSOR  IMPLEMENTATION  AND  ANALYSIS 

6.1  Introduction 

One  of  the  major  goals  for  machine  vision  systems  is  the  development  of  an  algo¬ 
rithm  which  can  perform  invariant  pattern  recognition  in  real  time.  One  method  which 
might  be  used  to  facilitate  the  solution  of  this  problem  is  creating  a  vision  system  which 
obtains  both  a  rectangular  and  a  polar  representation  of  images.  The  inherent  characteris¬ 
tics  of  each  of  these  geometries  could  be  used  to  attain  algorithms  which  can  more  easily 
achieve  certain  pattern  recognition  tasks. 

A  rectangular  representation  consists  of  a  rectangular  array  of  uniform  sized  square 
pixels.  A  representation  of  this  type  is  desired  for  two  main  reasons.  The  first  is  that 
translations  in  this  representation  consist  of  simple  shifts  of  the  coordinate  axes  and, 
hence,  translation  invariant  algorithms  could  be  performed  more  easily.  The  second  rea¬ 
son  is  that  most  existing  pattern  recognition  and  motion  detection  algorithms  are  based 
on  a  rectangular  representation  of  images. 

A  polar  representation  consists  of  an  array  of  pixels  whose  boundaries  are  deter¬ 
mined  by  exponentially  spaced  concentric  circles  (for  log-spiral  pixels)  and  equally 
spaced  rays  emanating  from  the  center.  A  representation  of  this  type  is  desired  because 
scalings  and  rotations  result  in  simple  shifts  of  the  coordinate  axes  and,  hence,  scale  and 
rotation  invariant  algorithms  could  be  performed  more  easily  with  this  representation. 
Another  desirable  attribute  of  this  representation  is  that  fewer  pixels  arc  required  to 
describe  a  given  image  (since  pixels  in  the  periphery  are  larger).  So,  algorithms  based  on 
this  representation  should  be  faster. 

Given  the  unique  advantages  of  each  of  these  representations  it  is  likely  that  an 
algorithm  which  utilized  both  of  these  representations  could  better  perform  invariant  pat¬ 
tern  recognition  tasks.  Hence,  an  important  consideration  is  the  means  by  which  this  dual 
representation  of  images  can  be  obtained.  This  report  discusses  several  possible  methods 
which  might  be  used,  and  includes  an  analysis  of  the  error  introduced  by  some  of  these 
methods. 

6.2  Sensor  Technology 

In  order  to  understand  how  a  dual  representation  of  an  image  can  be  obtained  it  is 
necessary  to  know  something  about  the  existing  technology  for  obtaining  digital 
representations  of  images.  Area  image  sensors  consist  of  a  two  dimensional  array  of  pho¬ 
todiodes.  Each  photodiode  represents  a  pixel  in  the  image  representation.  In  most  cases 
these  photodiodes  are  realized  by  charge-coupled  devices  (CCD’s).  There  are  two  basic 
ways  in  which  pixel  values  can  be  read  out  from  an  area  image  sensor.  One  way  is  line 
by  line.  A  type  of  camera  which  uses  this  method  is  the  frame  transfer  CCD  imager.  A 
second  way  is  pixel  by  pixel.  A  charge  injection  device  (CID)  imager  is  capable  of  this 
method.  Figure  6.1  shows  the  basic  geometry  for  each  of  these  camera  types. 

The  individual  columns  of  the  frame  transfer  imager  are  separated  by  insulation. 
This  forms  the  vertical  boundary  of  the  pixels.  The  horizontal  boundaries  of  the  pixels 
are  maintained  by  voltage  rails  (not  shown).  The  shape  of  these  pixels  is  usually  square 
or  slightly  rectangular.  The  image  is  integrated  for  one  half  a  frame  penod  (during  which 
the  photodiodes  accumulate  charge  proportional  to  the  amount  of  light  which  falls  within 


each  photodiode’s  boundaries)  and  then  the  values  are  shifted  rapidly  (by  the  charge 
transfer  capability  of  CCD’s)  into  a  storage  area  which  is  shielded  from  incident  light.  In 
this  manner  the  smearing  of  images  during  read  out  is  limited.  Next,  one  pixel  from  each 
column  is  shifted  to  the  output  register  which  then  reads  out  the  values.  This  process  is 
repeated  so  that  pixels  are  read  out  line  by  line  until  all  pixels  are  read  out.  In  order  to 
speed  up  this  output  process  an  output  register  could  be  provided  for  each  column.  In  this 
way  the  values  would  still  be  read  out  line  by  line,  however,  it  would  be  faster  since  each 
line  would  be  read  out  at  the  same  time. 


la)  lb) 

Fig  6  1  (a)  frame  transfer  CCD  imager 
(b)  CID  imager 

The  pixels  of  the  CID  imager  are  insulated  from  each  other  on  all  sides  The  shape 
of  these  pixels  is  also  usually  square  or  slightly  rectangular  The  pixel  values  can  he  read 
out  one  at  a  time  in  any  order.  A  pixel  is  read  out  only  when  Nnh  us  row  and  column  are 
selected.  This  is  achieved  by  voltage  rails  as  seen  in  Fig  6  libi  .An  obvious  drawback  m 
this  type  of  imager  is  that  each  pixel  integrates  the  image  at  a  different  time  This  effect 
can  be  minimized,  however,  by  ensunng  that  the  total  read  out  time  is  small  compared  to 
the  image  integration  time. 

All  standard  CCD  cameras  produce  image  representations  which  are  rectangular 
However,  this  docs  not  mean  that  other  cameras  could  not  be  built  which  vield  other 
representations  directly  A  novel  area  image  sensor  could  he  ^onstnicted  to  sield  dit 
ferent  representations  simply  by  changing  the  boundancs  which  dchne  the  pixels  Mow 
ever,  construcung  a  CCD  camera  with  non  rectangular  boundaries  would  probable  post- 
some  difficulties  for  chip  fabneators.  difficulties  which  may  or  mav  not  be  solvable 

For  example,  suppose  a  camera  were  desired  which  would  sield  a  (solar  representa 
tion  of  images  directly  The  pixel  hxsundanes  would  have  to  lixsk  as  shown  m  F  ig  6  T 
One  problem  which  exists  for  this  geometrv  is  that  pixel  values  would  all  have  to  be  nor 
malizcd  since  not  all  pixels  .ire  the  same  si/e  Second,  a  frame  transter  imager  probabis 
could  not  be  built  with  this  geometrs  because  there  o  no  .orsi-nient  pl.ue  ’o  tran-.let  ’be 


pixel  values  before  reading  them  out.  However,  a  CID  imager  could  probably  be  con¬ 
structed  without  much  difficulty.  The  pixels  could  be  addressed  b\  constant  radius  and 
constant  angle  voltage  rails. 

It  also  might  be  possible  that  other  novel  image  sensors  could  be  constructed  with 
many  varying  image  reprcsentauons  being  possible  Some  of  the  solutions  to  obtaining  a 
dual  representation  of  images  rely  on  this  possibility 
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t>.}  Method  for  Obtaining  Dual  Repmenialinnx  of  Images 

Perhaps  the  best  was  <>t  obtaining  both  a  reviangular  and  .1  polar  reprrscniation  't 
mages  would  be  to  devote  an  mdisulual  lamer.i  to  eash  representation  V  standard  frame 
•ranster  (  ri)  camera  could  be  used  to  obtain  the  rectangular  representation  direcils  and  t 
nosel  sensor  with  us  photosensors  laid  out  m  a  polar  formal  .ouUl  fse  use«l  lo  obtain  'he 
ooi.u  representation  directls  Ihis  mcth<nl  would  probable  work  howeser  another 
methorl  which  would  likelv  ^se  ‘ess  expensive  .ind  less  huiks  would  be  10  use  a  ongie 
amera  to  obtain  both  reprrsentati -ns  (  or  example  a  standard  (  (  I)  camera  -  ouiil  oe 
,se<!  Ihis  aiTwra  would  sicld  the  rec  langulai  representations  dirrctls  and  the  [K'lar 
''•preseni.ition'.  ould  be  obiaine<l  f'v  .  ombining  rectangular  pixels  to  torm  each  mdisi 
luai  [soiar  pixfi  Phe  value  >t  each  pixel  would  be  taken  ’o  'se  the  .iverage  v.ilue  '* 

■Un  Ti  tancuiar  pixe  s  wh|^r'.  n.ivr  .11  .e.isi  nail  'heir  .irea  wntiit'  'he  txolaf  pixel  '  'smt 
t.irifs  Iti  ’tii'  'ii.inner  txoih  'rpre seniaiiori -.  .  ouid  •••stained  ’’om  i  uitle  amera 

I  nc  .imer.i  ,ise<l  "eeii  'nu  n.icn  ts  ptioios** n u s  aul  'i.’  '  1  ifi  t.rniiuial  'or’ri.i'  i' 

•  l'’  ‘rr"  't'r  I  .\meri  .c ’• '  '  ”.(■  '■  phi  >!i  >se  r' SI  >rs  .tid  'Ii'  •  ,1  ttc'’"'’  'or'ii.il  m;ti‘'  'x 


able  to  better  obtain  the  dual  representations.  For  example,  a  camera  with  tnangular  pix¬ 
els  might  be  used.  Tnangular  pixels  could  be  combined  to  perfectly  form  rectangular  pix¬ 
els  and  could  probably  combine  to  form  polar  pixels  better  than  rectangular  pixels  could. 
Thus,  it  might  be  advantageous  to  use  a  camera  with  tnangular  shaped  pixels  rather  than 
rectangular  shaped  pixels. 

It  IS  clear  from  the  above  discussion  that  a  disadvantage  exists  in  using  only  one 
camera  to  obtain  both  a  rectangular  and  a  polar  representation  of  images.  Regardless  of 
the  shape  of  the  pixels  of  the  single  camera,  there  w-ill  always  be  some  error  in  trying  to 
form  both  rectangular  and  polar  pixels.  The  camera  which  minimizes  this  error  might  be 
deemed  the  best  solution.  In  order  to  determine  the  camera  type  (determined  by  the  lay¬ 
out  of  Its  pixels)  which  is  best,  a  means  of  analyzing  this  error  is  needed.  This  error  is 
examined  in  the  next  section. 

6.4  A  Method  to  Determine  which  Solution  is  Best 

The  possible  solutions  being  considered  all  require  the  supenmposition  of  a  compu¬ 
tation  plane  (determined  by  the  geometry  of  the  representation  desired)  atop  an  image 
plane  (determined  by  the  geometry  of  the  photosensors  of  the  camera  used),  and  the  sub- 
^equent  combining  of  image  plane  pixels  to  form  the  computation  plane  pixels.  This 
combining  is  not  exact,  since  the  image  plane  pixels  when  combined  together  may  not 
exactly  form  the  computation  plane  pixel.  Hence,  this  process  can  be  said  to  add  ’noise  " 
lo  the  system  This  noise  can  be  quantified  by  considering  two  specific  errors  which 
.irise  from  the  pnx'ess  of  combining  image  plane  pixels  to  form  computation  plane  pix¬ 
els  these  errors  are  as  follows: 

error  one  Area  included  in  computation  plane  pixel 
approximation  which  actually  lies  outside 
the  pixel  boundaries. 

r'rror  two  Area  not  included  in  the  computation  plane 
pixel  approximation  which  actually  lies 
inside  the  pixel  boundaries 

1  ceures  o  i  .md  h  4  on  the  next  page  illustrate  these  two  error  types  for  the  case  when  the 
•’■ace  puine  .s  restangular  and  a  log  spiral  plane  is  superimposed  (sn  it 

error  value  which  will  he  proportional  to  the  actual  error  of  appmximating  a 
I’lr.pwt.ition  plane  pixel  with  a  combination  of  image  plane  pixels  is 

f  error  one  ♦  error  twoi/iarea  ot  computation  plane  pixel) 

I  be  '  .rr,  >t  -rror  mie  aiul  error  two  needs  to  be  normalized  bv  ihe  area  ot  the  compiita 
,  '  ciiiie  p:sei  ’x'lrik;  approx irnaicd  since  it  is  this  ratur  to  which  the  possible  error  'in 
•  e  ompiji.tiioii  piane  pixel  value  i  will  depend  on  Another  t.ut  to  note  is  that  the  map 
:  L  ’ei  nil m lie  >1  . nn v  iiu  imhng  image  plane  pixels  with  at  least  one  halt  their  areas  >.on 
i.'  ed  Ai'hi:  'he  omput.ition  plane  pixel  being  appnmmated  tends  to  keep  f  minimal 

It  .  ■•s  in  i!  II  c  'ne  v.due  oi  [  for  varnuis  image  plane  geometnev,  a  means  ot  deter 
,■  ii  I  ’r-.'  .s  possible  Ihe  solution  which  minimizes  f  tor  ihc  sornpicie 

’  ippi’  c  '•  'f  1  ‘"t  'Ml  h  ..  ornpuiaiioii  plane  pixei  .ip;'ro\ini.itioiv  would  be  'he 


best  solution. 

The  next  section  includes  an  analysis  of  the  error  variable  E  for  different  image 
plane  geometries  and  computation  plane  geometries.  The  value  of  E  is  determined  by 
computing  areas  for  error  one  and  error  two  for  each  computation  plane  pixel  approxima¬ 
tion. 

6.5  Analysis  of  Error  for  Possible  Solutions 

The  first  solution  to  be  examined  is  the  case  where  the  camera  used  is  a  standard 
CCD  camera  with  square  pixels.  This  solution  is  desirable  for  two  main  reasons.  The  first 
is  that  many  cameras  are  available  which  have  square  pixels,  thus  a  new  camera  would 
not  have  to  be  developed.  The  second  reason  is  that  square  pixels  can  be  combined  to 
form  polar  pixels  with  little  extra  circuitry  needed.  Thus,  if  it  is  found  that  the  error  intro¬ 
duced  by  approximating  polar  pixels  with  combinations  of  square  pixels  is  not  so  large 
that  the  polar  representation  is  unacceptable,  then  this  solution  would  probably  be  recom¬ 
mended,  due  to  its  ease  of  implementation. 

In  order  to  determine  the  amount  of  error  introduced  by  this  method  a  Fortran  pro¬ 
gram  was  written  to  calculate  this  error.  The  program  first  determines  the  square  pixels 
which  have  at  least  one  half  of  their  area  within  the  given  polar  pixel  and  then  determines 
the  value  of  error  one  and  error  two  (defined  in  previous  section)  for  each  polar  pixel 
approximation.  In  calculating  these  values  a  slight  approximation  is  used.  This  approxi¬ 
mation  is  illustrated  by  the  diagram  in  Fig.  6.5.  This  approximation  will  have  only  a  very 
small  effect  on  values  of  error  calculated. 

The  results  of  several  computations  are  included  in  graphs  one  through  five  in 
Appendix  6.A.  A  summary  of  these  results  is  given  below. 

The  geometry  of  the  image  plane  and  the  computation  plane  for  graphs  one,  rwo, 
and  three  was; 

image  plane  ;  512x512  array  of  square  pixels  (-5  <  x,y  <  5) 
computation  plane  :  log-spiral  array,  36  rings  and  36  pixels 
per  ring.  Radius  of  inner  ring=0.5,  and 
radius  of  outer  ring=5.0. 

Graph  One  : 

This  graph  shows  how  error  one,  error  rwo,  and  error  one  -t-  error  two  vary  with 
radius.  For  each  ring  the  average  value  of  these  quantities  was  determined  and  plotted. 
From  this  graph  it  can  be  seen  that  error  one  and  error  rwo  tend  to  stay  very  close  in 
value.  This  is  expected  since  pixels  arc  included  only  if  they  have  one  half  their  area 
within  the  computation  plane  pixel.  Also,  it  can  be  seen  that  the  values  all  tend  to 
increase  with  increasing  radius.  This  is  due  to  pixel  sizes  growing  in  periphery. 

Graph  Two : 

This  graph  shows  how  E  (defined  in  previous  section)  varies  with  radius.  For  each 
ring  the  average  value  of  E  was  determined  and  plotted.  From  this  graph  it  can  be  seen 
that  E  is  largest  for  small  values  of  radius  and  then  steadily  decreases  as  radius  increases. 
This  shows  that  the  approximation  becomes  better  as  the  log-spiral  pixels  become  larger, 
as  expected. 


approarima  tion 


Fig.  6.5  Diagram  showing  straight  line  approximation  of  circular 
arc  across  square  pixel. 

Graph  Three : 

Graph  Three  :  This  graph  shows  how  error  one,  error  two,  and  error  one  +  error  two 
vary  with  angle,  (0  <  angle  <  pi/2).  For  each  ray  of  pixels  the  average  values  of  each  of 
these  quantities  was  determined  and  plotted.  It  can  be  seen  that  the  graphs  are  symmetric 
about  pi/4.  This  is  due  to  the  number  of  pixels  per  ring  being  a  multiple  of  four.  This 
causes  the  computation  plane  to  be  symmetric  about  pi/4  and,  hence,  these  graphs.  The 
values  of  error  one  and  error  two  can  be  seen  to  be  lowest  at  angles  of  0  and  pi/2.  This  is 
due  to  the  log-spiral  pixels  sharing  a  boarder  with  the  square  pixels  at  these  angles  and. 
thus,  tends  to  lower  the  value  of  these  errors. 


Graph  Four : 

image  planes  :  1)  600x600  array  of  square  pixels,  (-5  <  x,y  <  5) 

2)  500x500  array  of  square  pixels, 

3)  400x400  array  of  square  pixels, 

4)  333x333  array  of  square  pixels, 
computation  plane  :  log-spiral  array  of  pixels,  36  rings  and 

36  pixels  per  ring.  Radius  of  inner  ring=0.5, 
radius  of  outer  ring  =  5.0 

This  graph  shows  how  the  curves  E  vs.  radius  vary  for  square  arrays  of  different  sizes.  As 


'1 


expected,  the  smaller  the  individual  square  pixel  sizes,  the  lower  E  is  at  a  given  radius. 

Graph  Five  : 

image  plane  :  512x512  array  of  square  pixels,  (-5  <  x,y  <  5) 
computation  planes  :  log-spiral  array  of  pixels,  Radius  of  inner 
ring  =  0.5,  radius  of  outer  ring  =  5.0 

1)  28  rings,  28  pixels  per  ring 

2)  44  rings,  44  pixels  per  ring 

3)  52  rings,  52  pixels  per  ring 

4)  60  rings,  60  pixels  per  ring 

This  graph  shows  how  the  curves  E  vs.  radius  vary  with  different  computation  plane 
geometries.  As  expected,  the  smaller  the  individual  log-spiral  pixels  the  larger  E  is  at  a 
given  radius. 


The  next  possible  solution  to  be  examined  is  the  case  where  the  camera  used  is  a 
standard  CCD  camera  with  slightly  rectangular  pixels.  A  program  was  written  to  analyze 
this  case  since  many  of  the  available  CCD  cameras  actually  have  rectangular  pixels 
rather  than  square.  The  results  for  this  case  can  be  found  on  graph  six. 

Graph  Six  : 

image  plane  :  458x572  array  of  rectangular  pixels,  (-5  <  x,y  <5) 
computation  planes :  log-spiral  array  of  pixels,  Radius  of  inner 

ring  =  0.5,  radius  of  outer  ring  =  5.0 

1)  28  rings,  28  pixels  per  ring 

2)  44  rings,  44  pixels  per  ring 

3)  52  rings,  52  pixels  per  ring 

4)  60  rings,  60  pixels  p)er  ring 

This  graph  shows  how  the  curves  E  vs.  radius  vary  as  the  computation  plane  geometry  is 
changed.  The  result  was  very  similar  to  the  result  obtained  with  square  pixels.  The 
smaller  the  individual  log-spiral  pixels  the  larger  the  value  of  E  at  a  given  radius. 

The  results  of  this  graph  show  that  the  effect  of  the  pixels  being  square  or  slightly 
rectangular  does  not  significantly  alter  the  value  of  E.  Thus  a  camera  with  either  square 
or  slightly  rectangular  pixels  could  be  used. 

The  next  possible  solution  to  be  examined  is  the  case  where  the  camera  used  is  a 
novel  sensor  with  triangular  shaped  pixels  as  shown  in  the  Fig.  6.6.  A  camera  with  tn 
angular  shaped  pixels  as  shown  would  be  desirable  since  the  pixels  could  be  combined  to 
perfectly  form  rectangular  shaped  pixels  and  could  probably  be  combined  to  form  log- 
spiral  pixels  better  than  rectangular  pixels  could.  In  order  to  examine  this  a  Fortran  pro¬ 
gram  was  written  which  calculates  the  values  of  error  one,  error  two,  and  E  (as  defined 
before)  for  the  case  when  the  image  plane  has  triangular  pixels  and  the  computation 
plane  is  log-spiral.  The  results  of  these  calculations  are  shown  on  graph  seven,  .\ppendix 
6.A. 
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Graph  Seven  : 

image  plane  362x724  array  of  tnangular  shaped  pixels 
( -5  <  x.y  <  5 1 

computation  planes  log-spiral  array  of  pixels,  radius  of  inner 
nng  =  (1  5.  radius  of  outer  nng  =  50 

1  I  2S  nngs,  28  pixels  per  nng 

2  I  36  nngs.  36  pixels  per  nng 
'  I  44  nngs.  44  pixels  per  nng 

This  graph  shows  how  the  curves  T  vs  radius  sarv  as  the  computation  t'lane  gei't-icfs  ' 
caned  As  expected,  as  the  size  of  the  individual  log  spiral  p;\eis  4ecre.ise3.  ’^e  ■' 

F-.  at  a  given  radius  increased 
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2)  458x572  array  of  rect.  pixels 
362x724  array  of  cri.  pixels 

computaiion  plane  :  log-spiral  array  of  pixels.  28  nngs  and 
28  pixels  per  nng.  Radius  of  inner  ring  =  0.5. 
radius  of  outer  ring  =  5.0 

(iraph  Nine  : 

image  planes  :  1 )  512x512  array  of  square  pixels  (-5  <  x.y  <5) 

2)  458x572  array  of  rect.  pixels 

3)  362x724  array  of  tn.  pi.xels 

computation  plane  :  log-spiral  array  of  pixels.  44  rings  and 
44  pixels  per  nng.  Radius  of  inner  ring  =  0.5. 
radius  of  outer  nng  =  5.0 

These  tuo  graphs  show  how  the  curves  of  E  vs.  radius  compare  for  three  different  image 
plane  geometnes.  The  curves  for  square  and  rectangular  geometries  are  seen  to  be  very 
close  together.  The  curves  for  the  tnangular  geometry  had  values  for  E  which  were  for 
the  most  part  less  than  the  values  of  E  for  the  other  geometries. 

Since  the  image  plane  geometnes  were  chosen  so  as  to  make  individual  pLxel  areas 
the  same  for  all  three  cases,  it  would  appear  that  using  an  image  plane  which  has  triangu¬ 
lar  shaped  pixels  would  better  yield  a  log-spiral  representation.  The  rectangular  represen¬ 
tation  which  IS  obtained  by  combining  two  triangular  pixels  for  each  rectangular  pixel 
necessanU  creates  a  rectangular  representation  with  pixels  twice  the  size  of  the  indivi¬ 
dual  image  plane  pixel.  Thus,  the  rectangular  representation  created  would  have  less 
'esolution  then  if  a  camera  with  rectangular  shaped  pixels  of  size  equal  to  the  triangular 
pixeiN  were  used  to  obtain  the  rectangular  representation. 

Another  possible  solution  to  the  dual  mapping  problem  is  also  analyzed  as  a  matter 
•t  cunosity  This  solution  consists  of  an  image  plane  with  log-log  shaped  pixels  as  shown 
Fig  6  '  .A  camera  which  had  its  photosensors  laid  out  in  this  manner  would  obtain  a 
,.'g  k>g  representauon  of  images.  A  log-log  representation  of  images  is  essentially  a  rec- 
Mrgular  representation  with  a  similar  charactensuc  of  log-spiral  representations.  That  is. 
r-ixelv  m  the  periphery  are  larger.  Thus  less  pixels  are  required  to  desenbe  a  given  image 
.nd  aigi'nthms  which  use  this  representation  should  be  faster  In  order  to  determine  how 
wed  log  iog  shaped  pixels  combine  to  form  a  log-spiral  representation,  a  Fortran  pro- 
c-am  was  written  which  calculates  values  of  error  one.  error  two.  and  E  (as  defined 
■set 're  tor  the  case  when  the  image  plane  is  log-log  shaped  and  the  computation  plane  is 
.’g  scirai  The  results  of  these  calculations  are  shown  on  graph  ten.  Appendix  6. A. 
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This  graph  shows  how  the  curves  E  vs.  radius  vary  as  the  computation  plane  geomecrv  is 
altered.  As  expected,  as  the  size  of  the  individual  log-spiral  pixels  decreased  the  value  ot 
E  increased  for  a  given  radius.  The  values  of  E  are  significantly  higher  than  those  found 
for  similar  cases  in  which  square  and  triangular  shaped  pixels  were  used.  However,  this 
does  not  mean  that  a  log-spiral  representation  obtained  from  combining  log-log  shaped 
pixels  would  not  be  adequate.  This  will  depend  on  the  particular  application  being  con- 
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M  (Operation  continues  unti'  all  pixel'  in  the  i-uacni  frame  have  *sern  read 

6.6.2  Operation  of  the  digital  summing  arrangement  fFig  6.9 1 

1 )  At  the  beginning  of  a  pixel-summing  cvcle.  the  A  register  is  zeroed,  and  the  court 
down  and  S  registers  arc  loaded  with  the  run-length  of  the  next  polar  pixel 

2)  As  each  rectangular  clement  is  shifted  m.  it  is  passed  through  the  .A/D  convener  and 
summed  in  ALL’l.  The  results  of  each  addition  arc  placed  in  register  A  so  that  a 
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Figure  6.9.  Digital  pixel-summing  arrangement. 


cumulative  sum  may  be  obtained.  Also,  the  count-down  register  is  decremcntecL 

3)  When  the  count-down  register  indicates  that  the  last  element  has  been  summed,  the 
total  is  divided  (in  ALU2)  by  the  run-length  for  that  pixel  (stored  in  the  S  register), 
thereby  foiming  the  average  pixel  intensity. 

4)  The  cycle  continues  until  the  entire  frame  has  been  read. 

A  simple  example  of  this  ojjeration  follows.  We  will  use  the  second  approach  (digi¬ 
tal  summing)  and  detail  the  operation  in  a  clocked  fashion.  Assume  a  single-  or  muld- 


phase  wiockang  airangcmcm  in  which,  during  each  complete  cUxk  cvcle.  a  smgi'.  rec 
tangular  element  is  shifted  out  of  the  sensor  and  both  ALl  s  are  able  to  pertorm  their 
required  functions  uf  called  upon  to  do  soi.  We  also  assume  that  the  hardware  requires 
one  extra  clock  cycle  to  perform  the  set-up  operations  necessary  prior  to  summing  the 
elements  which  form  a  parucular  pixel.  For  this  example,  the  run-lengths  of  the  hrst 
three  polar  pixels  to  be  extracted  (all  that  we  will  consider  at  this  umei  are  4,  P.  and 
I  completely  arbitrary  at  this  point  since  we  do  not  know  the  layout  of  the  polar  pixels  i 
Simulation  is  as  follows  (clock  cycle  followed  by  funcuons  performed) 

1)  Zero  ROM  address.  Load  count-down  and  S  registers  with  run  length  of 
first  polar  pixel  to  be  shifted  out  (4i  Clear  the  A  register 

2)  Shift  first  rectangular  clement  into  A;!)  convener  (flash  convener)  and  add 
output  to  value  in  the  A  register  (0).  Store  result  in  the  A  register  Dcctc 
ment  count-down  register  Count-down  register  docs  not  equal  zero  lyct)  so 
conunuc. 

3 1  Shift  next  rectangular  clement  into  A/D  convener  and  add  output  to  value  in 
the  A  register.  Store  result  in  the  A  register  Decrement  count  down  regis¬ 
ter. 

4)  Same  as  3). 

5)  Same  as  31,  However,  now  the  count-down  register  is  zero  Therefore, 
divide  the  output  of  ALUl  by  the  contents  of  the  S  register  (the  run-length i 
in  ALU2  to  form  the  average  pixel  intensity.  Store  this  in  memory 

6)  Increment  the  ROM  address  to  point  to  the  next  run-lcngih  Load  the  count¬ 
down  and  S  registers  with  the  next  run- length  (17).  Gear  the  A  register. 

7)  Shift  first  rectangular  clement  of  the  next  polar  pixel  into  the  A/D  convener 
and  add  output  to  value  in  .the  A  register  (0).  Store  result  tn  the  A  register 
Decrement  count-down  register. 

8)  Shift  next  rectangular  clement  into  A/D  convener  and  add  output  to  value  in 
the  A  register.  Store  result  in  the  A  register.  Decrement  count-down  regis¬ 
ter. 


conunuc  .  .  . 

23)  Shift  next  rectangular  clctncnt  (n'^  in  this  pixel)  into  A/D  convener  and  add 
output  to  value  in  the  A  register.  Store  result  in  the  A  register.  Decrement 
count-down  register.  Count-down  register  is  now  zero.  Therefore,  divide 
the  output  of  ALUl  by  the  contents  of  the  S  register  (17)  in  ALU2  to  form 
the  average  pixel  intensity.  Store  this  in  memory. 

24)  Increment  the  ROM  address  to  point  to  the  next  run-length.  Load  the  count¬ 
down  and  S  registers  with  the  next  run-length  (32).  Clear  the  A  register. 

25)  Shift  first  rectangular  element  of  the  next  polar  pixel  into  the  A/D  converter 
and  add  output  to  value  in  the  A  register  (0).  Store  result  in  the  A  register. 
Decrement  count-down  register. 
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I  his  ai^iTUhiit  ;s  >et^  sssietn.HK  aiul  ea^iiN  .tTipletiienuM  t'  i'p'A" 

Sirue  'pixels  nn  ihe  innemiosi  rin^  i>t  the  polar  .nT.int^etrretn  'i.ise  die  iitiaile  :  ire.i 
■hc'.  Alii  '-le  fiTTneil  trntii  the  smallest  numher  ol  res tan(^tilar  elemetiis  <  i 'Pseijue;  '  ■.  n, 
s  Dnsiderations  arul  error  anaUsis  must  he  pertormeil  usmk:  these  pixels  Piat  this 
s  si>  tor  error  arialvsis  ,s  esidcnt  from  an  averaging  point  of  vica  Sukc  the  [X'lar  'pixei  s 
riiensitv  Atli  he  the  average  miensitv  of  the  restangular  elen'  'its  Ahose  sCnters  in- 
Aithin  its  horr'er  the  smaller  the  area  of  the  polar  pixel,  the  gre ner  a  ill  he  the  error  or 
-rror  v.irianse  of  the  estimated  intensity  As  for  timing  sonvider  nions.  sitKe  one  eloxk 
.'.sle  IS  Aasted  in  preparation  tor  each  polar  pixel  (preparing  ifie  hanlware  to  sum  and 
iverage  the  incoming  rectangular  elementsi,  the  smaller  the  polar  pixel  area,  the  greater 
Alii  he  the  cl(Kk  cycle  overhead  expended  Hence,  the  slcpendencc  of  timing  stmsidera 
■ions  on  the  innermost  nng  of  polar  pixels  (smallest  polar  pixels)  is  lustihed 

The  subject  of  error  analysis  is  one  which  must  ccnainly  he  discussed  since  there 
Alii  never  hc  a  perfect  ht  of  rectangular  elements  into  any  given  [-Hilar  pixel  It  is  this 
.irca  which  ac  will  now  address  All  analysis  will  hc  performed  using  an  or  of  nngs 
configuration  for  the  logarithmic  spiral  scnsxir  The  ('CD  sensor  ai11  consist  ot  ScS 
sxjuarc  light-sensitive  elements  (the  term  rectangular,  when  used,  refers  to  the  overall 
sensor  configuration  rather  than  actual  pixel  dimensions) 

6.7  Error  Analysis 

As  seen  in  Fig.  6.4,  there  is  never  a  perfect  ht  of  rectangular  elements  into  polar  pix 
els.  We  have  chosen  to  assign  a  given  rectangular  clement  to  a  polar  pixel  if  its  center  is 
within  the  defined  boundary  of  that  pixel.  So.  not  only  will  there  be  error  in  the  area 
covered  by  the  rectangular  elements,  but  some  of  the  intensity  infoimation  attributed  to  a 
given  pixel  will  be  rfom  outside  its  boundary.  Furthermore,  any  actual  error  analysis 
must  neccssanly  rely  on  the  specific  arrangement  and  resolution  of  the  polar  pixels  and 
rectangular  elements.  That  is.  in  a  given  rectangular  array  of  elements,  the  number  of 
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for  the  arc-of  nngs  configuration.  Also  r.  and  r_,  arc,  in  turn,  dependent  on  the  fovea  size, 
the  overall  array  size,  and  the  number  of  nngs,  N,  If  we  have  an  VlV  array  ot  square 
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I  herctorc.  the  sampled  vanance  is  given  by 


1:  v\f  ik’culf  ii'  irulude  those  error  esents  \^hich  are  hetvveen  one  and  I'ao  pixels 
!  a  r.  troni  :he  actual  polar  pixel  area  uermed  double  errors i.  vxe  must,  once  again 
os.ime  M>me  distribution  ot  these  events  so  that  the  average  area  converges  to  the  actual 
wei  ,trea  ITiere  are  tixi  mans  sariables  (the  problem  is  under-constrained)  so  \^e  must 
oiistrain  sinne  so  that  a  unique  solution  is  possible  Therefore,  w,e  will  assume  that  the 
'iMponion  ('t  errors  of  magnitude  <l*ni  to  those  of  magnitude  i2-ai  is  such  that  their 
oerage  is  also  zero  bunher,  we  can  assume  that  these  double  '  errors  account  for  a  por- 
;ori  ot  all  error  events  An  accounting  of  the  quantity  and  magnitude  of  each  of  these 
T'l'i  events  is  given  below  iper  n.,  polar  pixels,  error  normalized) 

em^r  maiznitude  quantity 
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1  he  sampled  variance  then  becomes 


<7*  =  ■<  ( l  -3)  a  ( 1  -a)  -  3  (2+a-a  )  ^ 


(N  -1)A‘  ^ 

p 


For  (3=0.  this  reduces  to  the  former  case  in  which  we  considered  only  Mngle  error 
e\ents.  For  the  worst  case,  3=1.  a=.5,  the  approximate  sampled  variance  is 

2  25  N 

2  ^ 
o'  = 

(N;-nA‘ 

which  is  nine  times  larger  than  the  worst  case  (a=.5)  for  single  error  events  only. 

Choosing  a  maximum  allowable  variance  of,  say.  5%,  Figs.  6. 10  and  6. 1 1  show  how 
the  number  of  rings,  N^,  relates  to  the  number  of  polar  pixels  per  ring,  N^.  for  arrays  of 
size5l2a512  and  1024  jt  1024  pixels.  Plots  of  3  =0  and  3=  25  are  shown.  Increasing  either 
N.  or  will  increase  the  error  vanance  -  an  adverse  effect.  Figure  6.12  shows  how  the 
area  of  the  pixels  on  the  innermost  nng  of  the  polar  array  relate  to  the  error  vanance. 


Number  of  Rings 


Figure  6. 10.  Pixels/ring  vs.  Number  of  rings  for 
Error  Variance  =  .05  (512  x  512  pixels) 


Number  of  Rings 


Ftg.  6.n  Pixels/ring  vs.  Number  of  rings  for 
Error  Variance  =  .05  (1024  x  1024  pi.xels) 


in  PI  M  (  ( )NnM  1  \  I  K  )N  hk(  >M  I  tfi  K  n  < 


I  Introdui. (ion 


”0  .1 


•s r .1  ■ . .1  ' ’.r  ..V '  . "V 


,1''  ^  .*r  v  V  - 


.i.  Me  Mv .  ..rat:-  ''  i; 


'  \  '  Vv  '  T 


^  •  t*  ^  '  C.*  .  ■  '  f  .  ’ .  '  ”  '  *  !  ^  1  r  *  T  ■’  \  C  ■ '  ,  '  '  '^  '  ^  ^  ^  ^  \  ^  .%  '' 

•is.  V  c.* .  ;,‘r wi!  c."  vl  r  .  .1.  ;*  \  *'' ,i  ■ '  N . V  '*■  *■*■;*•'  j  .  tr  .1  'V  * ,  ■  ■  '  '  '  i’  ’  x.  .  >  ■ '  “T  y  .i 

'lor-  FOE  .  .'!-f  NrH>r\s  •■'c  rv;e"N,.'-'  ••  ••'c  'c . ,■•.'•  -c  ^  :r 

"a'.C"  F  ^1  '»<  >  'r  ,^1  ■  '  >*  'T  -v’.  'Cv  ■  '  ■  '  -.  .1  •  \  ,t  ' '  C  4  '  .»■■,"  '  V  '  ;■ 

F  ' 'F  Of  NX  .u  F  <H 

\  d.,  o;  trie  ^orK  .U'nc  m-  i-ic  . .•  ••  .c;  r  -^av  ‘x-;- 

..rnijevl  :o  ’.h',>  va>c  '^herr  tne  I  OS  .orrrNp^''0'  'A.-.-  ;-e  Ft  '[  ',•■  ■’-os:  ^  s.,a. 

"OAc^rr.  ’.nc  MMia.  nOd  SF  envonipaNNCN  .i  .:v,,:c.:  v'riot’  '..■'iC  W  .  ,fA 

[  a’.erai  routKMi  ot  ’.he  visaa.  N<*n\or  .n  a  ''.c'”-  "  ‘.Sr  ..{:■.•■  ■ 

:-.e  LOS  O.c  FOh  v'leonicrru  ad'. ,  oSievO  a.or-.c  a',  v  to-- 

[  ( )F'  'o  '.nc  F-(X'  This  rotat'.on  hoAc\.cr  -cs.i.tv  .:•  :nc  ’.ra"  ■ '.'r-r-.a:;.':’  't  "lon.or'  .'■  -.'•c 
S  F-  rrofi',  radiai  cvpariMon  ;o  .aterai  imnsiation  as  :nc  a"«;.^'  >r:Accr  oc  1  idS  a-v:  '.''c 
F  OE'.  approac.Scs  (.'ornputatK’n  ot  'I'*  space  .s  rns're  ...;  .cOcr  tncsc  ^o(io.:;o''s 

In  the  tolk'Ainc  sections  sve  auI  discuss  rese-L-^n  .'r  '•staimns;  and  atdi/i-'t:  tne 
intoimanon  inherent  m  siptic  hiivc  tor  the  computatK'n  ot  Jcptn  and  structure  in  tn.e  S  F 
An  algontnm  tor  the  detenninatK’'n  ot  the  tls'cc  field  a  ill  also  'xr  div-  ussed  aione  Aith  a; 
examination  ot  the  problems  Ahich  anse  m  such  computations  Possihie  phs  sioiot'.c  a. 
correlates  or  optic  floA  computanons  a  ill  be  presented  next  F- mails,  a  ncA  appn'acn  to 
the  problem  using  the  BVS  and  sphencai  ssmmetr.  a  ill  be  examined 

7.2  Overview  of  Turrent  Research 

Work  on  the  determination  ot  object  structure  and  depth  t'rom  EM  has  des  eloped 
rapidly  over  the  last  decade.  Vanous  schemes  have  been  pnsposed  tor  the  determination 
of  optical  flow  and  the  extractor  of  information  from  the  flow  held  .A  general  criticism 
of  most  current  work  is  that  the  proposed  schemes  are  computationally  intensive  and 
therefore  of  limited  usefulness  in  real-time  applications  Rather  than  res  lew  this  work  in 
detail,  however,  we  will  atte'npt  to  provide  a  general  understanding  of  the  current  para¬ 
digm  by  examining  the  work  of  selected  researchers. 

Prazdny  [2-4]  has  made  significant  contributions  to  this  held.  He  analyzes  optic 
flow  in  terms  of  the  projection  of  a  six  parameter  transformation  (three  translational  and 
three  rotahonal  parameters)  in  a  Canesian  coordinate  system.  Prazdny  has 
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v.i  -  '  M-  nn.'tx'^eo  a  s.  "erne  tor  determining  tvtth  the  structure  of  3D  space  and 

oan  eie'-  '•  'D  motion  ot  oOiects  in  the  \'F  He  unlizes  a  tuo  stage  process.  The 

•  ^;ace  .iete""."ev  ;ne  orticui  rEo\*  neid  and  the  \econd  >tage  interprets  this  held, 
■'■-i  ■  c  ■■  '"'  at.o-'  .i>'at  tr.e  Ntracture  and  motion  parameters. 

V-t.v  n  ■>  inat  orticai  hovs  neid>  are  noisy  in  the  sense  that  velocity  vectors 
.1.  "v  s.  a.  -  ..x  or-e.jted  and  that  this  lead  to  a  loss  or  corruption  of  the  available 
•  ■"'■.it..'-  \  o^mer  rn'Oiem  anses  if  there  are  multiple  obiects  in  the  field  of  view. 

'  '  .a:-  'e',..t  .'v...;Mon  '^hich  leads  to 'ingulanties  idiscontinuities)  in  the  flow  field. 

\  _  tor  o'lerrretmg  opp^al  flow  must  be  robust  enough  to  handle  these  sons  of 

\d.^  '  aro'oa^  n  nrst  ranitions  the  flow  held  into  connected  segments.  These  seg- 
•'e■lt^  .u-e  •"cn  merged  under  the  hyps^thesis  that  segments  aith  similar  characteristics 
••.•v,.  :  tne  ^ame  rtgicis  moving  obiect 

'  ne  a.gor.tnm  searches  for  'D  motion  parameters  which  satisfy  all  segments  within 
1  ••'ergec  set  Re.ative  depth  is  then  obtained  from  these  parameters.  This  scheme  is 
'e.at.ve.v  imperv  u>us  to  the  presence  of  noise  and.  in  addition,  can  handle  cases  where 
Cvts  m  the  view  are  moving  independently  iin  addiuon  to  the  EM) 

.lam  and  ho  o'ileagues  |0|  have  also  examined  the  computation  of  the  relative  depth 
"om  EM  Their  worW  is  ot  panicular  interest  in  that  they  utilize  a  polar  coordinate  sen- 
s.'r  geometrv  rather  than  the  common  rectilinear  geometry  used  by  other  workers.  WTien 
mapped  to  a  computation  plane  via  a  conformal  complex  loganthmic  mapping  optical 
flow  can  hif  analyzed  as  a  single  dimension  (’■)  motion  This  depends,  however,  on  the 
LtTS  being  idenncai  to  the  FOE.  thus  being  a  radial  flow  If  the  LOS  diverges  only 
N.ighilv  from  the  FOE.  the  meihvxi  fails  due  to  distonions  in  the  optical  flow  pattern. 

Computation  of  Optical  Flow 

The  work  discussed  above  assumes  that  fairly  accurate  flow  held  information  is 
available  Obtaining  t.h'S  infotmation.  however,  is  a  significant  problem.  A  number  of 
schemes  are  available  for  computing  determination  of  correspondences  for  prominent 
features  of  the  image  across  frames  The  main  drawback  to  this  approach  is  that  general 
solutions  to  the  correspisndence  problem  do  not  exist.  Thus,  these  methods  tend  to  be  ad 
hiX'  or  domain  specific. 

Buxton  &  Buxton  |7]  have  presented  an  approach  which  assume^  that  changes  in 
the  intensity  function  are  due  only  to  the  motion  induced  by  the  EM.  The  optic  flow  data 


can  then  be  obtained  by  a  low-level  spatio-temporal  difference  of  Gaussians  (DCXj)  filter 
to  compute  the  location  of  moving  edges  within  the  input.  Our  group  has  used  this  filter 
previously  and  shown  its  usefulness  [8].  The  DOG  filter  determines  zero-crossings 
(second  derivative  of  the  intensity  factor)  which  are  equivalent  to  edge  information. 
Optic  flow  is  obtained  from  the  zero-crossing  data  through  a  least  squares  procedure. 
This  results  in  the  computation  of  what  Buxton  and  Buxton  term  vernier  velocities. 

A  major  difficulty  with  this  approach  is  that  it  is  computationally  intensive,  requir¬ 
ing  the  performance  of  approximately  50,(XX)  convolutions  for  each  128  X  128  frame 
pair.  Significant  savings  in  computation  time,  however,  can  be  achieved  through  the  use 
of  parallel  processing. 

7.4  Physiological  Correlates  of  Optic  Flow  Computation 

The  importance  of  3D  computation  from  optic  flow  for  biological  organisms  indi¬ 
cates  that  the  nervous  system  should  contain  specialized  mechanisms  for  performing  this 
function.  The  available  evidence  indicates  that  these  mechanisms  may  involve  multiple 
visual  areas  of  the  cortex,  primarily  area  18  and  the  inferior  parietal  visual  area. 

An  examination  of  the  distribution  of  velocity  sensitive  cells  in  area  18  indicates 
that  within  10  degrees  of  the  area  central  is  the  major  proportion  of  cells  are  velocity 
tuned  [9].  These  cells  are  bandpass  for  velocity  and  generally  are  directionally  sensitive. 
These  properties  are  well  suited  for  the  detection  and  tracking  of  object  motion.  Beyond 
10  degrees  eccentricity,  the  proportion  of  velocity  tuned  cells  decreases  rapidly  and  the 
major  proportion  of  cells  are  velocity  highpass,  with  orientation  but  not  direction  selec¬ 
tivity  [9].  These  cells  show  responses  that  are  linearly  increasing  with  the  log  of  the 
velocity.  These  properties  are  well  suited  for  the  computation  of  optic  flow. 

Parietal  visual  neurons  (PVNs)  have  been  extensively  studied  by  Mountcastle  and 
his  colleagues  [10].  PVNs  have  large,  bilateral  receptive  fields  and  are  responsive  to 
motion  but  apparently  are  not  velocity  tuned.  PVNs  have  a  complex  "opponent  radial" 
receptive  field  organization  [10].  They  are  sensitive  to  motion  either  toward  or  away 
from  the  center  of  their  receptive  fields  along  some  preferred  axis.  In  addition,  these 
cells  can  show  differential  responses  as  a  function  of  whether  or  not  the  motion  crosses 
the  center  of  the  field. 

PVNs  are  excellent  candidates  for  the  performance  of  higher  level  processing  of  3D 
space.  Whereas  area  neurons  may  function  in  the  determination  of  optic  flow,  PVNs, 
which  are  afferently  connected  to  area  18,  may  utilize  this  information  in  the  construc¬ 
tion  and  maintenance  of  a  3D  model  of  the  environment.  This  idea  is  strengthened  by  the 
fact  that  parietal  lesions  can  result  in  severe  impairments  of  spacial  processing  and  visu¬ 
ally  guided  behavior  [10]. 

7.5  An  Approach  to  3D  Computation  from  EM 

It  was  pointed  out  in  previous  discussions  that  vinually  all  schemes  for  determining 
depth  from  EM  depend  on  an  alignment  of  the  line  of  sight  and  the  focus  of  expansion. 
As  the  LOS  deviates  from  the  FOE,  the  radial  flow  of  the  image  becomes  progressively 
distorted  toward  lateral  translation  over  the  visual  field.  This  distonion  complicates  the 
determination  of  the  optic  flow,  so  that  some  schemes  [e.g.  3D1  incorporate  methods  for 
locating  the  FOE  within  the  flow.  We  are  developing  an  approach  which  eliminates  this 
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problem  and  allows  the  determination  of  depth  regardless  of  the  LOS/FOE  relation. 

An  important  aspect  of  our  approach  is  the  determination  of  a  3D  reference  frame. 
Most  previous  work  computes  3D  motion  relative  to  a  set  of  predefined  3D  coordinates. 
We  claim  that  a  coordinate  system  can  and  should  be  determined  on  the  basis  of  the  optic 
flow  itself.  This  provides  for  greater  adaptability  of  the  visual  system  and  allows  the 
recalibration  of  the  3D  model  of  the  environment  following  disorientation. 

Our  approach  involves  four  primary  components. 

1)  Flow  Field  Detection.  The  BVS  sensor  will  be  used  for  the  detection  of  image 
motion.  Two  possible  schemes  for  the  detection  of  motion  are  i)  the  use  of  the  spatio- 
temporal  DOG  filter  or  ii)  an  adaptation  of  the  correlation  method  develop^  by 
Narathong  [see  section]. 

2)  Mapping  to  Computation  Plane.  The  use  of  the  BVS  sensor  allows  the  use  of  the 
conformal  complex  logarithmic  mapping  to  the  computation  plane.  We  propose  that  the 
computation  "plane"  be  conceptually  understood  as  the  surface  of  a  sphere,  defined  by 
three  parameters,  I,  -  and  r.  This  sphere  is  oriented  such  that  the  direction  of  motion  is 
always  aligned  with  its  polar  axis.  Then  the  parameter  I  corresponds  with  lines  of  lati¬ 
tude,  -  corresponds  with  the  longitude  lines  and  r  represents  the  radius  of  the  sphere. 
This  sphere  can  be  represented  as  a  two  dimensional  (I  and  -)  surface  where  0<  I  <Ji  and 

0<-<2rt. 

3)  Organization  of  the  Flow  Field.  The  spherical  computation  plane  reflects  the 
invariant  geometry  of  the  flow  field  in  that,  under  EM,  all  world  objects  visually  move 
along  arcs  from  the  FOE  to  the  FOC.  Thus,  since  the  polar  axis  of  the  sphere  is  aligned 
with  the  direction  of  sensor  motion,  aU  image  motion  can  be  analyzed  in  terms  of  motion 
along  a  single  dimension  (*l).  This,  however,  depends  on  the  proper  pattern  of  activation 
of  the  computation  plane.  Given  a  limited  VF,  it  must  be  mapped  to  the  computation 
plane  across  an  area  that  is  congruent  with  the  orientation  of  the  VF  relative  to  the  direc¬ 
tion  of  the  EM.  Since  individual  points  are  ambiguous  with  respect  to  this  function,  this 
organization  will  depend  on  the  relation  between  velocities  the  entire  VF.  This  entire 
issue  is  eliminated,  however,  if  we  allow  the  receptor  surface  to  be  a  sphere  as  well.  In 
this  case,  the  receptor  and  computation  planes  are  always  congruent.  This  is  very  similar 
to  the  situation  found  in  insect  vision,  e.g.,  dragonflies,  or  other  animals  with  360  degree 
VFs. 

4)  Computation  of  Depth.  Given  the  congruence  between  the  sensor  array  output 
and  the  computation  plane,  activity  levels  at  each  point  in  this  plane  will  be  proportional 
to  the  distance  of  some  surface  element  from  the  sensor  along  the  line  of  sight 
represented  by  that  location  in  the  computation  plane. 

These  four  stages,  motion  detection,  mapping,  flow  field  organization  and  depth 
computation  provide  the  basis  for  thedetermination  of  3D  spatial  structure  and  motion 
from  EM.  This  system  is  relatively  simple  in  its  basic  construction  due  to  the  advantages 
of  the  BVS  and  log  conformal  mapping  for  the  determination  of  radial  motion. 
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8.  VESTIBULAR-OCULAR  MOTION  FOR  TARGET  TRACKING 


Models  of  human  eye  movement  have  been  studied.  There  are  many  features 
displayed  in  the  abilities  of  biological  systems  which  could  be  exploited  in  the  design  of 
a  biological  visual  sensor.  Among  these  seem  to  be  the  accuracy  with  which  animals  are 
able  to  track  target  motion  and,  in  particular,  the  periodic  movement  of  targets.  It  should 
be  mentioned  at  the  outset  that,  though  accurate  models  of  this  behavior  appear  to  have 
been  made,  their  usefulness  is  doubtful.  The  most  recent  of  these  models  studied,  which 
incorporates  features  of  its  predecessors,  relies  on  methods  of  estimation  of  target  motion 
from  past  observations.  Whereas  this  model  performs  well  in  the  tracking  of  regular  tar¬ 
get  motion,  the  advantages  seem  small.  The  models  will  be  discussed  nonetheless. 

Young  and  Stark  [1]  proposed  a  model  of  human  smooth  pursuit  and  saccadic  eye 
movements  as  a  sampled-data  control  system.  No  attempt  was  made  to  handle  regular 
periodic  motion  in  any  fashion  other  than  that  used  for  non-periodic  target  movements. 
The  eye  movement  signals  for  these  two  types  of  motion  are  generated  in  parallel  and  are 
additive.  Smooth  pursuit  motion  is  stimulated  by  the  retina  velocity  error  between  target 
and  eye  movements,  whereas  saccadic  motion  is  generated  by  retinal  position  error. 
Their  model  incorporated  these  known  features,  even  delaying  the  saccades  by  one  sam¬ 
pling  period  (=200ms)  from  the  time  the  position  error  was  measured,  in  agreement  with 
experimental  results.  This  model  simulated  actual  eye  movement  to  non-periodic  motion 
very  well. 

Eckmiller  [2]  discussed  Neural  Control  of  Foveal  Pursuit  and  Saccadic  eye  move¬ 
ments.  He  was  concerned  with  the  synaptic  paths  and  signal  generation  methods  which 
characterize  the  primate  oculomotor  system  and  its  ability  to  pursue  moving  visual  tar¬ 
gets  or  direct  their  optical  axis  towards  briefly  presented  stationary  targets.  His  proposed 
model  incorporated  a  sequence  of  three  major  functional  areas.  They  arc  the  spatiotem- 
poral  translator,  the  motor  program  generator  and  the  neural  integrator  blocks. 

Spatiotemporal  Translation  is  concerned  with  the  transformation  of  spatial  position 
error  information  of  retinal  signals  into  a  smooth  pursuit  velocity  error.  The  article 
placed  major  emphasis  on  defining  the  "neuroanatomical  architecture"  (which  defines  the 
connections  between  input  neurons,  representing  specific  retinal  locations,  and  output 
neurons  of  the  Spatiotemporal  Translator)  as  vital  to  the  realization  of  the  Spatiotemporal 
Translator. 

The  spatiotemporal  Translator  provides  signals  to  the  Motor  Program  Generator 
(MPG),  the  second  block  of  the  signal  pathway.  Eckmiller  sites  the  MPG’s  as  the  source 
of  time  courses  of  neural  activity.  That  is,  oculomotor  activity  is  in  response  to  signals 
generated  in  the  MPG’s,  regardless  of  current  input  from  the  Spatiotemporal  Translator 
block.  (The  Spatiotemporal  Translator  supplies  signals  to  the  MPG  for  interpretation.)  It 
is  in  the  MPG’s  that  models  of  regular  waveforms  are  generated  so  that  the  eyes  can  fol¬ 
low  targets  (which  follow  such  waveforms)  without  any  latency  or  error.  "Very  little  is 
known  about  the  the  neural  realization  of  different  motor  program  generators,"  and  so  it 
is  in  this  area  that  much  research  needs  to  be  done. 

Eckmiller  concludes  that  further  simulation  of  these  models  requires  the  resolution 
of  fundamental  questions  concerning  biological  systems.  Specifically,  how  the  motor 
program  generator  stores  and  updates  different  motor  programs  for  smooth  pursuit 


movement  is  not  known.  Further,  the  mathematical  algorithm  of  the  neural  predictor 
mechanism  must  be  researched. 

A  third  model,  by  Bahill  and  McDonald  [3],  called  the  "Target  Selective  .Adaptive 
control"  (TSAC)  model,  aims  to  describe  human  eye  movement  in  response  to  periodic 
target  motions.  In  addition  to  the  smooth  pursuit  and  saccadic  branches,  this  model 
incorporates  a  "TSAC"  branch  to  estimate  target  motion  and  provide  the  eye  with  a  suit¬ 
ably  time-advanced  version  of  the  motion  so  that,  after  the  predictable  delays  of  the  eye 
mechanism  have  occurred,  eye  movement  stays  locked  on  to  target  position.  This  model 
was  simulated  not  with  a  "library"  of  acceptable  input  waveforms,  as  the  authors  pre¬ 
ferred,  but  with  a  finite  differences  method  of  target  motion  estimation.  It  was  able  to 
formulate  an  equation  describing  target  motion  using  n-t-1  samples  for  an  order  input 
waveform.  Problems  occurred  if  the  periodic  waveforms  were  corrupted  with  even  small 
levels  of  noise. 

As  mentioned  above,  it  is  not  deemed  necessary  to  incorporate  these  estimation 
techniques  into  the  BVS  as  their  gains  are  marginaJ.  Conventional  position  control 
methods  should  provide  adequate  sensor  orientation  and  are  not  fraught  with  the  noise 
problem  to  which  we  have  alluded. 
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